<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<title>Educated Guesswork</title>
	<subtitle></subtitle>
	
	<link href="https://educatedguesswork.org/feed/feed.xml" rel="self"/>
	<link href="https://educatedguesswork.org"/>
	<updated>2026-03-29T23:25:36Z</updated>
	<id>https://example.com/</id>
	<author>
		<name>Eric Rescorla</name>
		<email>ekr@rtfm.com</email>
	</author>
	
	<entry>
		<title>How not to mandate device-based age assurance</title>
		<link href="https://educatedguesswork.org/posts/device-based-age-assurance/"/>
		<updated>2026-03-29T23:25:36Z</updated>
		<id>https://educatedguesswork.org/posts/device-based-age-assurance/</id>
		<content type="html">&lt;p&gt;Over the past several years, quite a few jurisdictions have started
to require age assurance for access to various forms of content
and experiences. In most current cases, this amounts to a mandate
on the service (PornHub, Facebook, etc.), but understandably
this isn&#39;t popular with services, who have in some cases been
&lt;a href=&quot;https://www.aylo.com/assets/files/age_verification_fact_sheet.pdf&quot;&gt;advocating&lt;/a&gt;
&lt;a href=&quot;https://www.politico.com/news/2025/09/13/california-advances-effort-to-check-kids-ages-online-amid-safety-concerns-00563005&quot;&gt;to&lt;/a&gt;
move requirements from the service to the device. A number of jurisdictions
have recently passed legislation requiring device-based age
assurance, including
&lt;a href=&quot;https://leginfo.legislature.ca.gov/faces/billStatusClient.xhtml?bill_id=202520260AB1043&quot;&gt;California AB 1043&lt;/a&gt;,
&lt;a href=&quot;https://capitol.texas.gov/tlodocs/89R/billtext/pdf/SB02420F.pdf&quot;&gt;Texas SB2420&lt;/a&gt;,
and &lt;a href=&quot;https://le.utah.gov/~2025/bills/static/SB0142.html&quot;&gt;Utah SB 142&lt;/a&gt;.
(see &lt;a href=&quot;https://kgi.georgetown.edu/research-and-commentary/age-assurance-online/&quot;&gt;this report&lt;/a&gt;
from the Knight-Georgetown Institute (KGI)
by Zander Arnao, Alissa Cooper, and myself for the bigger picture on age assurance).
While device-based age assurance can be made to work and has some
technical advantages, actually writing requirements that don&#39;t
undesirable side effects is a lot harder than it looks, as we&#39;ll be
seeing in the remainder of this post.&lt;/p&gt;
&lt;h2 id=&quot;types-of-device-based-age-assurance&quot;&gt;Types of Device-Based Age Assurance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#types-of-device-based-age-assurance&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Age assurance systems comprise two main technical components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Evaluating the user&#39;s age&lt;/li&gt;
&lt;li&gt;Enforcing that the user is only able to access content
and experiences approved for their age.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To a great degree, these components are orthogonal: it doesn&#39;t how you
established the user&#39;s age mostly doesn&#39;t really matter that much to
how you enforce it. Broadly speaking, these there are a number of ways enforcement can
work. In this post we&#39;ll be looking at systems where (mostly) the
evaluation and enforcement happen on the user&#39;s device. As I said,
most current age assurance systems do both evaluation and enforcement
at the service level, so this is something new, and there&#39;s quite
a bit of variation in the various ideas, in part because it&#39;s new
and in part because I think there actually are a lot more options
for how to do on-device enforcement.&lt;/p&gt;
&lt;p&gt;At a high-level, there are three main ways to do device-based
enforcement.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Devices can attempt to filter out unwanted content on their
own.&lt;/li&gt;
&lt;li&gt;Devices can refuse to install and/or run apps (programs)
which are rated for an age range other than that of the
user (or, on some cases, are approved by parents).&lt;/li&gt;
&lt;li&gt;Devices can provide an API that apps can use to determine
the user&#39;s age range and then take appropriate action.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first of these approaches doesn&#39;t really work well, for reasons
we&#39;ll get into below.
The second is obviously attractive because it takes
all the load off of the apps, but it doesn&#39;t work well for apps which
provide both restricted and unrestricted content and experiences. The
obvious example of this type of app is a Web browser, which can browse
porn sites but also totally unrestricted sites, but you could have
a similar situation with an app like Facebook if there were restrictions
on what content and experiences it could provide to minors.
For example, the &lt;a href=&quot;https://www.nysenate.gov/legislation/bills/2023/S7694/amendment/A&quot;&gt;New York SAFE for Kids
act&lt;/a&gt;,
is a service-based age assurance requirement that
restricts using algorithmic recommendation systems for
children, but one could easily imagine this kind of restriction
being levied at the device level.
In these cases, enforcement has to happen in the app—or on
the service it is the front-end for—because it&#39;s
the app that knows whether a specific type of experience is restricted.&lt;/p&gt;
&lt;h3 id=&quot;enforcement-by-apps&quot;&gt;Enforcement by Apps &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#enforcement-by-apps&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The figure below shows the two main ways for app-level enforcement of
the correct experience, namely in the app and on the service provider.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/device-based-assurance.png&quot; alt=&quot;Device-Based Age Assurance Architecture&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Device-based age assurance architecture. Source: &lt;a href=&quot;https://kgi.georgetown.edu/research-and-commentary/age-assurance-online/&quot;&gt;Rescorla, Arnao, and Cooper 2026&lt;/a&gt;. Original figure by Kate Hudson.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The first option, in shown the top diagram, is that the service
provider labels content and experiences with age ratings and let the
app determine what experience to give the user. The second option,
shown in the bottom diagram, is that the app sends the service
provider the user&#39;s age—or more likely which age range
they are in—and the service provider provides the correct
experience. In the case of regular mobile apps where the service
provider operates both the app and the server, (e.g., Facebook),
the distinction between these two architectures is basically
an internal implementation choice; the service provider can break
up functionality any way it wants, just as with other functionality.&lt;/p&gt;
&lt;p&gt;However, in the case of the Web, it does matter because the browser
and the site are in general operated by different entities and so
there needs to be some protocol they use to provide age enforcement,
and so that needs to be written down.  In principle, either
architecture is possible, and each has pros and cons (see our report
for more on this), but the most mature mechanism in this area is for
the server to indicate &amp;quot;adult&amp;quot; content to the client using the
&lt;a href=&quot;https://www.rtalabel.org/&quot;&gt;Restricted To Adults&lt;/a&gt; label, which is
already widely used by adult Web sites.&lt;/p&gt;
&lt;h2 id=&quot;types-of-regulation&quot;&gt;Types of Regulation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#types-of-regulation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With the help of Gemini Deep Research, I was able to identity
the following possibly non-exhaustive list of legislation and
proposed legislation&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
for device-based age assurance in the US.&lt;/p&gt;
&lt;!-- Double check --&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Jurisdiction&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.govinfo.gov/app/details/BILLS-119hr3149ih&quot;&gt;United States Federal (H.R. 3149)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://legiscan.com/TX/text/SB2420/id/3204209&quot;&gt;Texas (SB 2420)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Enacted (Under Injunction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://le.utah.gov/Session/2025/bills/introduced/SB0142S05.pdf&quot;&gt;Utah (SB 142)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Enacted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://le.utah.gov/~2024/bills/static/SB0104.html&quot;&gt;Utah (SB 104)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Enacted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.legis.la.gov/Legis/BillInfo.aspx?i=248616&quot;&gt;Louisiana (HB 570)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Enacted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://legiscan.com/CA/text/AB1043/id/3245379&quot;&gt;California (AB 1043)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Enacted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://legiscan.com/AL/text/HB161/id/3357012&quot;&gt;Alabama (HB 161)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Enacted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://trackbill.com/bill/alaska-house-bill-46-app-stores-parents-and-minors/2617955/&quot;&gt;Alaska (HB 46)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://kslegislature.gov/li/b2025_26/measures/sb372/&quot;&gt;Kansas (SB 372)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.flsenate.gov/Session/Bill/2026/1722/Analyses/2026s01722.cm.PDF&quot;&gt;Florida (SB 1722)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed (Died in committee)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://legiscan.com/ID/text/S1158/id/3155300&quot;&gt;Idaho (SB 1158)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed (Died in Committee)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://legiscan.com/IL/text/SB3977/2025&quot;&gt;Illinois (SB 3977)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://legiscan.com/MI/text/SB0284/id/3229711&quot;&gt;Michigan (SB 284)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.nysenate.gov/legislation/bills/2025/S8102/amendment/A&quot;&gt;New York (SB S8102A)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.scstatehouse.gov/sess125_2023-2024/bills/4689.htm&quot;&gt;South Carolina (H.4689)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://leg.colorado.gov/bills/SB26-051&quot;&gt;Colorado (SB 26-051)&lt;/a&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proposed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;There&#39;s a fair bit of overlap in these rules (its not uncommon
for legislators to start with &amp;quot;model legislation&amp;quot; produced by
some external party), but there are also a lot of variation,
including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Whether the user is required to demonstrate their age or
whether the device just asks them for it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Who the requirements are levied on (manufacturers, OS providers,
app stores, etc.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Whether app store downloads are restricted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What developers are required to do if they learn that a
user is a minor.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Whether restrictions apply to desktop or just mobile.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This also provides us with some good examples of how these
regulations can be written in ways that are likely to be
ineffective or problematic.&lt;/p&gt;
&lt;h2 id=&quot;device-level-filtering&quot;&gt;Device-Level Filtering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#device-level-filtering&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Several of these mandates require that the device itself filter content.
Here&#39;s Utah SB 104:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All devices activated in the state shall:
(1) contain a filter;
(2) ask the user to provide the user&#39;s age during activation and account set-up;
(3) automatically enable the filter when the user is a minor based on the age provided by the user as described in Subsection (2);
(4) allow a password to be established for the filter;
(5) notify the user of the device when the filter blocks the device from accessing a website; and
(6) allow a non-minor user who has a password the option to deactivate and re-activate the filter.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And here&#39;s South Carolina&#39;s H.4689 (not enacted):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(1) contain a filter;
(2) determine the age of the user during activation and account set-up;
(3) set the filter to &amp;quot;on&amp;quot; for minor users;
(4) allow a password to be established for the filter;
(5) notify the user of the device when the filter blocks the device from accessing a website; and
(6) give the user with a password the opportunity to deactivate and reactivate the filter.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The general idea here is supposed to be that it applies to
all uses of the platform, no matter what software the user
using. It&#39;s understandable why one would want this, but the
tricky bit is the definition of filter. Here&#39;s the definition from
South Carolina:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(3) &amp;quot;Filter&amp;quot; means software installed on a device that is capable
of preventing the device from accessing or displaying obscene
material as defined by Section 16-15-305 through Internet browsers
or search engines via mobile data networks, wired Internet
networks, and wireless Internet networks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Read literally this is a real problem because we don&#39;t actually
know how to implement it. There are two main potential approaches
to Internet filtering:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Have a list of sites which are known or believe to host restricted
material and which are filtered by the browser.&lt;/li&gt;
&lt;li&gt;Attempt to detect restricted content (e.g., via some AI nudity classifier)
and refuse to show it.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Neither of these approaches is great. List-based systems are a common
feature of existing parental control systems and of institutional
controls systems (e.g., in schools). The &lt;a href=&quot;https://www.heritage.org/sites/default/files/2025-03/BG3895.pdf&quot;&gt;available evidence&lt;/a&gt; is that they
don&#39;t work very well, and have problems both with overblocking (blocking
content that shouldn&#39;t be restricted) and underblocking (failing to
block content that should be restricted). Obviously,
what should and shouldn&#39;t be restricted is a bit of a judgement call,
but this is inherently a hard problem.&lt;/p&gt;
&lt;p&gt;Even if it were possible to accurately mechanically distinguish between &amp;quot;obscene&amp;quot; and
&amp;quot;non-obscene&amp;quot; material, it&#39;s not really practical for a single piece of
software to &amp;quot;prevent the device from accessing&amp;quot; that material. A
computing device isn&#39;t a single monolithic thing but a collection
of different pieces of software—including software written
by other people than the device manufacturer and installed by the
user—and it&#39;s not generically practical to prevent such
software from displaying &amp;quot;obscene material&amp;quot; if that software
doesn&#39;t want you to.&lt;/p&gt;
&lt;p&gt;Instead, typical existing device-based filtering mechanisms work by
restricting what network connections you can make. This is
to some extent effective but increasingly less so as client
software adopts technologies like &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8484&quot;&gt;DNS over HTTPS&lt;/a&gt;
and &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc9849.html&quot;&gt;Encrypted Client Hello&lt;/a&gt; that
are designed to conceal activity from the network and also
do so to some extent from the machine. This isn&#39;t to
say it&#39;s not possible to supervise some of the behavior
of software, but not well if they are trying to evade it.
Therefore, the effectiveness of that supervision is going to be limited
unless the operating system also restricts what you
can install. If I were an operating system vendor, I would
be quite concerned about my ability to comply with this
provision.&lt;/p&gt;
&lt;p&gt;The Utah text is better in this respect:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(3) &amp;quot;Filter&amp;quot; means generally accepted and commercially reasonable software used on a
device that is capable of preventing the device from accessing or displaying obscene
material through Internet browsers or search engines owned or controlled by the
manufacturer in accordance with prevailing industry standards including blocking
known websites linked to obscene content via mobile data networks, wired Internet
networks, and wireless Internet networks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As noted above, your options as an operating system vendor are a bit
limited, but the &amp;quot;generally accepted and commercially reasonable&amp;quot;
language suggests you probably don&#39;t need to do anything beyond the
normal types of filtering. As I said, they&#39;re not great in terms of
effectiveness, but that&#39;s
not your problem as the OS vendor who just wants to be in compliance
with the law. Moreover, the restriction to &amp;quot;Internet browsers or
search engines owned or controlled by the manufacturer&amp;quot; means that you
don&#39;t have to make third party software conform, which makes
the job a lot easier.&lt;/p&gt;
&lt;p&gt;On the other hand, this also means that it&#39;s not likely to be
effective because users can just download software that bypasses
the filtering.&lt;/p&gt;
&lt;h3 id=&quot;alternative-approaches&quot;&gt;Alternative Approaches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#alternative-approaches&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I don&#39;t think this text can really be salvaged. It&#39;s just not that practical
to require the &lt;em&gt;device&lt;/em&gt; to be responsible for ensuring that users
aren&#39;t able to view any contraband material. It&#39;s more practical
to require that applications do some filtering, with the exception
of Web browsers, which face many of the the same challenges in doing
unilateral filtering of websites that devices have in constraining
applications.&lt;/p&gt;
&lt;h2 id=&quot;including-open-source-operating-systems&quot;&gt;Including Open Source Operating Systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#including-open-source-operating-systems&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A number of these regulations require that the operating system
vendor participate in age assurance. For example, here&#39;s the
relevant text from CA AB 1043.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(f) “Covered manufacturer” means a person who is a manufacturer of
a device, an operating system for a device, or a covered
application store.&lt;/p&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;p&gt;(a) A covered manufacturer shall do all of the following: (1)
Provide an accessible interface for requiring account holders at
account setup that requires an account holder to indicate the birth
date, age, or both, of the user of that device for the sole purpose of
providing a signal regarding the user’s age bracket to applications
available in a covered application store.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The basic problem here is that the definition of &amp;quot;covered
manufacturer&amp;quot; is very broad. It clearly covers all operating systems,
including open source operating systems like Linux. What&#39;s less clear
is who it applies to in those cases. For example, if you&#39;re a
contributor on an Open Source project like
&lt;a href=&quot;https://www.debian.org/&quot;&gt;Debian&lt;/a&gt; are you personally responsible for
making sure your software does this stuff? Some developers, such as
the &lt;a href=&quot;https://x.com/midnightbsd/status/2027101491211718765&quot;&gt;MidnightBSD&lt;/a&gt;
operating system and even the &lt;a href=&quot;https://github.com/c3d/db48x/commit/7819972b641ac808d46c54d3f5d1df70d706d286&quot;&gt;DB48X calculator software&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
have added restrictions forbidding use in California.&lt;/p&gt;
&lt;p&gt;The Illinois language is even scarier in that it explicitly
calls out individuals:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;Operating system provider&amp;quot; means a person or entity that
develops, licenses, or controls the operating system software
on a computer, mobile device, or any other general purpose
computing device.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This &lt;a href=&quot;https://www.linuxteck.com/california-age-verification-law-linux/#:~:text=MidnightBSD%20has%20already%20responded%20by,sums%20for%20volunteer%2Drun%20projects.&quot;&gt;article&lt;/a&gt; from LinuxTeck does a good job of covering the issues here and how
various entities are responding to California AB 1043 and how
the Open Source organizations failed to intervene before the
law was passed, with the result that the text is concerningly
overbroad.&lt;/p&gt;
&lt;h3 id=&quot;alternative-approaches-2&quot;&gt;Alternative Approaches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#alternative-approaches-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This all seems pretty undesirable, but it&#39;s also somewhat tricky to
figure out how where to draw the line. Plainly you don&#39;t want
individual developers who work on iOS to somehow be responsible for
whether Apple does age assurance, so for commercial operating systems,
you could probably make clear that the requirement is on the vendor
rather than on the user. The situation is a lot less clear for open source operating systems.
In some cases, such as Debian, a Linux distro will be backed
by some nonprofit, or in other cases, such as Ubuntu, by
a company. In both of these cases one could imagine levying
the restriction on that entity. The situation seems less clear
for some other distros like &lt;a href=&quot;https://www.linuxmint.com/&quot;&gt;Linux Mint&lt;/a&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Even then, you have to worry about whether those distros can effectively
comply for versions of the operating system that have already shipped
(see the next two sections).&lt;/p&gt;
&lt;h2 id=&quot;effectiveness-date&quot;&gt;Effectiveness Date &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#effectiveness-date&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Most of these mandates require the app store provider, OS provider,
manufacturer, or app developer respectively to do things immediately
effective on the effectiveness date of the mandate. This isn&#39;t
necessarily a problem in cases where the user&#39;s interaction with the
regulated entity is online, but that isn&#39;t always the case. For example,
making an account with the iOS app store or the Google Play Ready
store is an inherently interactive activity and so Apple or Google
can enforce the new rules for new accounts or potentially for
existing accounts when users choose to download a new app.&lt;/p&gt;
&lt;p&gt;The situation is much less straightforward when the new requirements
are implemented by software on the user&#39;s machine which therefore
must be updated to take effect. Even on systems which auto-update,
updates can take a very long time to roll out. For example, the
figure below shows the fraction of Android versions over time.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/android-versions.png&quot; alt=&quot;Android versions over time&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Android versions over time. Source &lt;a href=&quot;https://www.appbrain.com/stats/top-android-sdk-versions&quot;&gt;AppBrain&lt;/a&gt;.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;As you can see, Android 9.0 still has 4% market share, even though
it was released in 2018 and hasn&#39;t received security updates
since &lt;a href=&quot;https://endoflife.date/android&quot;&gt;2022&lt;/a&gt;. Collectively, over
40% of the Android ecosystem is on versions that aren&#39;t
receiving security updates. In many cases, this is because users
aren&#39;t updating or third party vendors aren&#39;t providing updated
firmware. In any case, it&#39;s not clear how Google could as
a practical matter make those devices implement new age range
APIs. The situation is better on iOS, but there are still plenty
of people who haven&#39;t update their iPhones and iPads.&lt;/p&gt;
&lt;p&gt;Mobile devices phones are actually the best case scenario because
(1) they were often designed with auto-update in mind and (2) people
overwhelmingly get their apps through the app store. Many desktop
apps don&#39;t have any kind of auto-update functionality, and so it&#39;s not
clear how an app vendor would conform to requirements to implement
new behavior.&lt;/p&gt;
&lt;h3 id=&quot;alternative-approaches-3&quot;&gt;Alternative Approaches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#alternative-approaches-3&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This issue is actually comparatively easy to fix by simply
requiring the new behavior on any substantial update (e.g., not
just security fixes) to the system. This isn&#39;t quite as satisfying,
but the most important devices from the perspective of managing
minor&#39;s access are going to be getting regular updates or just
eventually replaced, and so the end result will be good coverage
in relatively short order.&lt;/p&gt;
&lt;h2 id=&quot;geographic-scope-and-location-ambiguity&quot;&gt;Geographic Scope and Location Ambiguity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#geographic-scope-and-location-ambiguity&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For obvious reasons, the scope of these restrictions is typically
limited to the jurisdiction requiring them. For instance, Utah&#39;s SB142
applies when &amp;quot;an individual who is located in the state&amp;quot; and
California&#39;s AB1043 refers to an Account Holder, who is &amp;quot;an individual
who is at least 18 years of age or a parent or legal guardian of a
user who is under 18 years of age in the state.&amp;quot; Implementing
these mandates correctly obviously depends on knowing which jurisdiction
this device is in. This is not always straightforward.&lt;/p&gt;
&lt;p&gt;There are four main ways of determining a device&#39;s location:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Via &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Global_Positioning_System&amp;amp;oldid=1344534909&quot;&gt;GPS&lt;/a&gt; or
other satellite-based location systems (e.g., GLONASS, Galileo, etc.)&lt;/li&gt;
&lt;li&gt;By measuring the distance from in-range mobile phone towers.&lt;/li&gt;
&lt;li&gt;By looking at the local WiFi environment (e.g., which WiFi access points are
in range).&lt;/li&gt;
&lt;li&gt;Via &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internet_geolocation&amp;amp;oldid=1339927887&quot;&gt;IP geolocation&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, not all of these will always be available.&lt;/p&gt;
&lt;p&gt;Typically, mobile devices will perform some sort of sensor
fusion to estimate the location based on the available signals.
For obvious reasons, mobile phones need to be able to see local mobile
towers and most mobile phones now have GPS, so it&#39;s typically
straightforward for the device operating system to determine where it
is. By contrast, only some tablets have mobile connectivity and/or
GPS chips, and ones that do not will have to fall back to the other
two methods. This isn&#39;t necessarily a problem because WiFi-based
geolocation can be quite accurate, depending on the WiFi environment.
The situation is even worse on desktop. Most desktop devices don&#39;t
have mobile connections or GPS at all, and so you&#39;re stuck with
Wi-Fi and IP addressed based location.&lt;/p&gt;
&lt;p&gt;Importantly, just because the device knows where it is that
does not mean that apps know where they are. Because
location information can be sensitive, modern operating
systems require user permission before sharing the user&#39;s location
with the app. Many apps do not need location for their
current functions and for obvious reasons Google and Apple
&lt;a href=&quot;https://security.googleblog.com/2026/02/keeping-google-play-android-app-ecosystem-safe-2025.html#:~:text=Preventing%20unnecessary%20access%20to%20sensitive,to%20strengthen%20our%20privacy%20policies.&quot;&gt;discourage&lt;/a&gt;
asking for excessive permissions. As a result, only around
&lt;a href=&quot;https://42matters.com/blog?p=detect-potentially-unwanted-applications-pua-with-app-data&quot;&gt;half of apps&lt;/a&gt;
have location permissions. Apps which do not have location permissions
have two main choices:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ask for location permissions&lt;/li&gt;
&lt;li&gt;Use IP-based geolocation only.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither of these is great. From the user&#39;s perspective, having
app unnecessarily ask for location isn&#39;t great for privacy,
especially as the user has no way of knowing whether the
app is exfiltrating the data. It&#39;s also not great from app&#39;s perspective
because it makes users suspicious (&amp;quot;why does my calculator want
my location?&amp;quot;), and in fact this kind of excessive permission
ask is one of the signals that an app is doing some kind of
suspicious user tracking.&lt;/p&gt;
&lt;p&gt;IP-based geolocation doesn&#39;t require the user to provide permission
because the app can observe the IP address itself without help from
the operating system. The good news is that there are free IP location
databases which can be
&lt;a href=&quot;https://lite.ip2location.com/ip2location-lite#database&quot;&gt;downloaded&lt;/a&gt;
and will get you resolution down to the city level. The bad news is
that they are updated frequently, so you either need to run some kind
of geolocation service or push the updated database to the client for
resolution. Moreover, it&#39;s likely the user&#39;s device is behind a
&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1&quot;&gt;NAT.&lt;/a&gt; and doesn&#39;t know its own IP address so you
have to use some external server to resolve the IP address. Note that
this applies even to apps which otherwise wouldn&#39;t &amp;quot;phone home&amp;quot; at all,
so now you&#39;re effectively tracking the location of users!&lt;/p&gt;
&lt;h3 id=&quot;alternative-approaches-4&quot;&gt;Alternative Approaches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#alternative-approaches-4&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The bottom line here is that while the operating system generally will
be able to determine where a device is with some level of resolution,
the situation is much worse for apps, many of will have to ask for
otherwise unnecessary permissions or do substantial extra work in
order to get the user&#39;s location; either of these choices also comes
at a potential increased risk to user privacy. As we suggest in our
report, it would be a lot easier for apps if the age range APIs
that these mandates already make the operating system offer also
provided the jurisdiction at a coarse level (e.g., the state) so
that the app could enforce the appropriate policy. Even so, it is
likely that both the OS and the apps will occasionally get the answer
wrong (consider the case of a device which must use IP geolocation and
is located near a state border) and so these mandates probably should
contain some text about &amp;quot;commercially reasonable&amp;quot; attempts to verify
location.&lt;/p&gt;
&lt;h2 id=&quot;including-irrelevant-apps&quot;&gt;Including Irrelevant Apps &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#including-irrelevant-apps&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted above, quite a few of these mandates have a structure
where the platform determines—or sometimes just collects—the
user&#39;s age and then provides it to apps via an API, which developers
are required to use. Unfortunately, in many cases &lt;strong&gt;all&lt;/strong&gt; apps
are required to request the user&#39;s age even if there&#39;s nothing
meaningful for them to do with it. Here&#39;s some language
from Utah SB 142:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(8) &amp;quot;Developer&amp;quot; means a person that owns or controls an app made available through an
73 app store in the state.&lt;/p&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;(1) A developer shall:
(a) verify through the app store&#39;s data sharing methods:
(i) the age category of users located in the state; and
(ii) for a minor account, whether verifiable parental consent has been obtained;
(b) notify app store providers of a significant change to the app;
(c) use age category data received from an app store or any other entity only to:
(i) enforce age-related restrictions and protections;
(ii) ensure compliance with applicable laws and regulations; or&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And here is New York S8102A:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ol start=&quot;5&quot;&gt;
&lt;li&gt;&amp;quot;COVERED  DEVELOPER&amp;quot;  SHALL  MEAN  A PERSON WHO OWNS OR CONTROLS A
WEBSITE, ONLINE SERVICE,  ONLINE  APPLICATION,  MOBILE  APPLICATION,  OR
PORTION THEREOF THAT IS ACCESSED BY A USER IN THE STATE OF NEW YORK.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;p&gt;§  1542. OBLIGATIONS FOR COVERED DEVELOPERS. 1. ALL COVERED DEVELOPERS
SHALL REQUEST AN AGE CATEGORY SIGNAL FOR A USER FROM A COVERED  MANUFACTURER  WHEN  SUCH  USER DOWNLOADS AND LAUNCHES SUCH DEVELOPER&#39;S WEBSITE,
SERVICE, OR APPLICATION.
2. IF THE SIGNAL INDICATES THAT A USER IS A COVERED MINOR,  THEN  SUCH
COVERED  DEVELOPER SHALL TREAT SUCH SIGNAL AS AN AUTHORITATIVE INDICATOR
OF SUCH USER&#39;S AGE FOR THE PURPOSES OF COMPLIANCE  WITH  ANY  APPLICABLE
LAW  AND  THE COVERED DEVELOPER SHALL BE DEEMED TO HAVE ACTUAL KNOWLEDGE
THAT A USER IS A COVERED MINOR ACROSS ALL PLATFORMS AND POINTS OF ACCESS
S. 8102&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We&#39;ll get to the &amp;quot;WEBSITE&amp;quot; part of this later, but notice that this
means that if you offer &lt;strong&gt;any&lt;/strong&gt; kind of app, even one which doesn&#39;t collect
any user data and which doesn&#39;t have any kind of restricted content,
you still are required to query the platform for the user&#39;s age. This goes
for calculator apps, weather apps, etc. There are two obvious problems
here, one from the perspective of the user and one from the perspective
of the developer.&lt;/p&gt;
&lt;p&gt;From the perspective of the user, this requirement creates
unnecessary leakage of sensitive information—i.e., the
user&#39;s age bracket—from the platform to the app. This is
the kind of information that under normal circumstances that
we would want platform to put behind a consent dialog, but in
this case all apps are going to request it as a matter of course.&lt;/p&gt;
&lt;p&gt;From the perspective of a developer, this means that everyone
has to adjust their apps to request the user&#39;s age, even if they
don&#39;t do anything with the information. For instance, if you&#39;re
a calculator app, you don&#39;t need to know the user&#39;s age because the
app doesn&#39;t behave any differently for children.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; This is obviously a huge imposition on
developers, who may not even be aware of these new regulations,
which, of course, differ from jurisdiction to jurisdiction, and
very likely many developers will unknowingly be in violation of these rules.&lt;/p&gt;
&lt;p&gt;Many of these mandates are tied to app stores, or in some cases specifically
to mobile devices, but for those that aren&#39;t, such as
CA AB1043, the problem is much worse, for several reasons:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;strong&gt;There&#39;s no central app store backing you up.&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;In principle
the iOS and Android app stores could verify compliance or at
least that apps call the APIs (though I don&#39;t know that they
do).&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;It&#39;s a lot less clear what a desktop application is.&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;There are of course applications that people can download,
such as Microsoft Word, but there&#39;s lots of software you
can download that has some command line interface but isn&#39;t
really an end-user app. For example, are the &lt;a href=&quot;https://nodejs.org/en&quot;&gt;node.js&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
JavaScript runtime or the &lt;a href=&quot;https://www.latex-project.org/&quot;&gt;LaTeX&lt;/a&gt;
typesetting systems required to query for your age?&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;&lt;strong&gt;There are a lot of open source apps with no real connection to the
jurisdiction.&lt;/strong&gt;
Suppose that I write an app and put it up
on GitHub. Am I now liable when some Linux distribution makes
it available to their users, even if I had nothing to do
with it at all at all?&lt;/p&gt;
&lt;h3 id=&quot;parental-monitoring-features&quot;&gt;Parental Monitoring Features &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#parental-monitoring-features&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Although some of these mandates mostly require that apps query
for the user&#39;s age range and then have minimal requirements on
what they do with it, a number require apps to provide parental
control and monitoring features, even when those features are not really
sensible for the app in question. For example, here is Alaska
HB 46:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(c) A developer shall provide readily accessible features for a parent of a
minor located in the state to implement time restrictions on using the developer&#39;s app,
including allowing the parent to view metrics reflecting the amount of time the minor
is using the app and setting daily time limits on the minor&#39;s use of the app.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So this means that if I make a calculator app, I need to build a
whole system for implementing parental control metrics and
daily time limits! Aside from this being a burden on developers,
it also introduces a whole new set of privacy risks because it
now requires developers to monitor usage on a per-user basis,
store that usage information, and make it accessible to parents.&lt;/p&gt;
&lt;p&gt;Even if we ignore whether it&#39;s a good idea for parents to
be able to track and control the usage of &lt;em&gt;every&lt;/em&gt; app on the user&#39;s
device, this is a significant privacy regression in other
ways, especially if it&#39;s
not done carefully: for example if you just have the app
phone home whenever it&#39;s used, then it becomes a tracking
system because the developer gets to see the user&#39;s IP
address and potentially use it to roughly geolocate them.
It&#39;s also likely that many developers will just decide to
use some third party SDK or even a third party service
to provide this function, which creates its own privacy
risks, both because it allows that entity to track the user
and because many of those SDKs have bad security and
privacy practices, whether intentionally or unintentionally.&lt;/p&gt;
&lt;h3 id=&quot;alternative-approaches-5&quot;&gt;Alternative Approaches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#alternative-approaches-5&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The basic problem here is the requirement that every app query the
user&#39;s age information without regard to the properties of the app
and whether it will do anything with the information. An alternative
here would be to require apps to request that information only
if their behavior would change as a result.&lt;/p&gt;
&lt;p&gt;For example, in the case of New York S8102A, the information is
intended to be an &amp;quot;authoritative indicator&amp;quot; of the user&#39;s age that
gives the developer &amp;quot;actual knowledge that the user is a covered
minor&amp;quot;, which is presumably intended to hook into other statutes such
as the &lt;a href=&quot;https://www.nysenate.gov/legislation/bills/2023/S7694/amendment/A&quot;&gt;SAFE For Kids
Act&lt;/a&gt;
that have substantive requirements for how the app behaves (e.g., not
showing behavioral ads). The
statute could instead be written to require the apps to either:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Conform to those substantive requirements for all users.&lt;/li&gt;
&lt;li&gt;Query for the user&#39;s age range and conform to those substantive requirements
for minors.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You might need some legal cleverness to make this work with the
existing laws that depend on &amp;quot;actual knowledge&amp;quot; of minor status,
for instance by reading all those places as &amp;quot;actual knowledge or
failure to query the platform where possible&amp;quot;, but this seems like
it&#39;s at least a potential alternative avenue.&lt;/p&gt;
&lt;h2 id=&quot;requiring-web-support&quot;&gt;Requiring Web Support &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#requiring-web-support&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Several of these proposals also place requirements on the Web. Recalling the
New York text from above, a &amp;quot;covered developer&amp;quot;, which includes a
person who &amp;quot;owns or controls a website&amp;quot; is required to &amp;quot;
request an age category signal for a user from a covered  manufacturer
when such user downloads and launches such developer&#39;s website,
service, or application&amp;quot;. You don&#39;t usually download a website,
but you might launch one, and in this case the website would be required
to request an age category signal.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/download-website.png&quot; alt=&quot;You wouldn&#39;t download a website&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;There isn&#39;t any standard for what this signal would look like, but
in the context of an operating system, we&#39;d expect the OS to provide
an API that the app would call. These APIs might differ between
operating systems, but the developer has to do some work to
port their app to a different operating system anyway, and that
work could involve using the correct API.&lt;/p&gt;
&lt;p&gt;In the context of the Web, we&#39;d either expect a standardized header
from the browser or a standardized Web API that the browser would
implement, but neither of these exists, so it&#39;s not clear what the
site is supposed to do. Worse yet, the requirements in these mandates
to provide signals are on the app store or the operating system,
but in this case the requirement needs to be on the browser, which
first needs to query the OS for the signal and then provide it to the
site. As nothing requires them to do so, it&#39;s not clear what
the site is expected to do.&lt;/p&gt;
&lt;h3 id=&quot;technical-fixes&quot;&gt;Technical Fixes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#technical-fixes&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;These are technical problems that are partly fixable, but not really
immediately. First, there needs to be some widely understood mechanism
for sites to request an age category signal from the browser. Without
that, sites won&#39;t know what to do and there is a risk that even
browsers which do provide an age category signal might do so in
incompatible ways. Technically it&#39;s probably not that hard to design
something here, although it&#39;s also not necessarily as easy as it
sounds, and standardization takes time.  In principle, each
jurisdiction could define their own signal, but this is obviously very
painful for developers.  Once such a signal is defined, you would then
need to require that browsers support it. This is comparatively easy,
as you&#39;re just proxying whatever the operating system says.&lt;/p&gt;
&lt;h2 id=&quot;who-is-responsible-for-signals%3F&quot;&gt;Who is responsible for signals? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#who-is-responsible-for-signals%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The majority of the mandates which involve developers receiving and
processing an age category signal require the developer to request
it. However, the Michigan SB284 text is unusual in that it
only seems to require them to process it if they have it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Sec. 5. (1) A covered manufacturer shall take commercially reasonable and technically feasible steps to do all of the following:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;(a) On activation of a device, determine or estimate the age of the device&#39;s user or users.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;(b) Using an application programming interface, provide an application store, website, application, and online service with a digital signal regarding the age of the device&#39;s user or users, specifically whether the user is any of the following:&lt;/p&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;p&gt;Sec. 7. (1) A website, application, or online service that makes mature content available must do all of the following:&lt;/p&gt;
&lt;p&gt;(a) Recognize and allow the receipt of a digital age signal from a covered manufacturer.&lt;/p&gt;
&lt;p&gt;(b) If the website, application, or online service knowingly makes available a substantial portion of mature content, block access to the website, application, or online service if a digital age signal is received under section 5(1) that indicates an individual is not 18 years of age or older.&lt;/p&gt;
&lt;p&gt;(c) If the website, application, or online service knowingly makes available less than a substantial portion of mature content, do both of the following:&lt;/p&gt;
&lt;p&gt;(i) Block access to known mature content if a digital age signal is received under section 5(1) that indicates an individual is not 18 years of age or older.&lt;/p&gt;
&lt;p&gt;(ii) Provide a disclaimer to a user or visitor before displaying known mature content.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From a technologist&#39;s perspective, this text doesn&#39;t really make sense. Recall
from the discussion of the Web case above, that there are two main options
on the Web:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An unsolicited signal in an HTTP header&lt;/li&gt;
&lt;li&gt;A Web API request from the server&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This text doesn&#39;t really match either of these, because the manufacturer
is supposed to provide the signal and the website is supposed to &amp;quot;recognize
and allow the receipt of&amp;quot; it, all of which suggests that the manufacturer
is intended to initiate the process. However, it also says that this
is done &amp;quot;using an application programming interface&amp;quot;, which would usually
refer to something initiated by the Web site.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Moreover, as above, we have the problem that the levy is on the manufacturer,
but they can&#39;t necessarily ensure that third party Web browsers provide
the signal.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The situation is equally if not more confusing for an &amp;quot;application, or online
service&amp;quot;. Applications make API queries to the operating system, not the other
way around: if operating systems want to convey unsolicited information
to applications they do it with environment variables, command line arguments,
etc. I don&#39;t even know what it means for an online service to receive
this information except via an app or a Web browser, so this text seems
superfluous.&lt;/p&gt;
&lt;h3 id=&quot;alternative-approaches-6&quot;&gt;Alternative Approaches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#alternative-approaches-6&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This just seems like confusing drafting. This text could readily
be replaced with text that required:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The manufacturer to provide the API&lt;/li&gt;
&lt;li&gt;Applications to call the API.&lt;/li&gt;
&lt;li&gt;Web browsers to supply a Web API&lt;/li&gt;
&lt;li&gt;Web sites to call the Web API&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;checking-interval&quot;&gt;Checking Interval &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#checking-interval&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the privacy challenges with age-based enforcement mechanisms is
that the user&#39;s birthday is inherently sensitive information. For this
reason, age assurance mandates typically require the disclosure not of
the user&#39;s precise age but rather of age categories (e.g., &lt;code&gt;13-16&lt;/code&gt;).
However a consequence of using age ranges like this is that they interact
poorly with the fact that people continue to age at a rate of one day
per day and so some people who are under 18 today will be 18 tomorrow, etc.
However, without knowing someone&#39;s birthday you can&#39;t know when they
transition from one age category to another.&lt;/p&gt;
&lt;h3 id=&quot;no-limits-on-checking&quot;&gt;No Limits on Checking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#no-limits-on-checking&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The obvious thing way to handle this is to just request the user&#39;s age
whenever the user launches the app. However, this has the unfortunate
side effect of the app likely eventually learning not only the user&#39;s
precise age but their birthday if they observe the user being age &lt;code&gt;N&lt;/code&gt;
and &lt;code&gt;N+1&lt;/code&gt; on two consecutive days.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;For this reason, you want to encourage if not require that sites
request the user&#39;s age category less frequently than this.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
Many of these mandates do not have any restrictions in this area,
with the result that we should expect sites to end up learning
more than they need to about user&#39;s ages.&lt;/p&gt;
&lt;h3 id=&quot;too-infrequent-checking&quot;&gt;Too Infrequent Checking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#too-infrequent-checking&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In order to address this issue, many of these mandates limit how
frequently the device can query for the user&#39;s age category, typically
to one per twelve months. Here&#39;s some typical text from Alabama HB 161:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(b) A developer may request age category data in any of
the following scenarios:
(1) No more than once during each 12-month period to
verify either of the following:
a. The accuracy of age category data associated with an
account holder.
b. Continued account use within the age category.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The problem with this text is that unless the developer queries the
user&#39;s age category right on their birthday, there will be a period
during which the developer is underestimating the user&#39;s age, with
the average underestimate being 6 months (because there is basically
an even chance of any day of the year with respect to their birthday).
With this text—and the text of several other mandates—there&#39;s
no way for the user to demonstrate that they are now in the correct
age range.&lt;/p&gt;
&lt;h3 id=&quot;alternative-approaches-7&quot;&gt;Alternative Approaches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#alternative-approaches-7&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This is a genuinely hard problem because of the need to balance user
privacy against user&#39;s desire to access experiences consistent with
their true age. Probably the best you can do here is to keep the the
once-per-12-month restriction but add text which allows the user to
&lt;em&gt;request&lt;/em&gt; that the app re-request their age. Some of these mandates have
some text that potentially could be construed this way, for instance
&amp;quot;When there is reasonable suspicion of account transfer or misuse
outside 15 the verified age category&amp;quot; in Utah SB142, but it&#39;s not
really misuse in most cases if you give a 17 year old the 13-16
experience, so it would be good to have clarity.&lt;/p&gt;
&lt;p&gt;Note that this issue also has technical implications for age category
signals in the Web context: if you have a Web API, it can behave
the same as the OS API,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
but if you have a header, the situation is
somewhat more complicated because the browser is just sending the
header with every request. At minimum the browser would need to
remember when it initially sent the header and replay the same
value until the minimum re-request period had expired, but then
the site might still need an API to re-request in exceptional
cases. All of this suggests that the header may not be the best
approach.&lt;/p&gt;
&lt;h3 id=&quot;securely-establishing-the-user&#39;s-age&quot;&gt;Securely Establishing the User&#39;s Age &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#securely-establishing-the-user&#39;s-age&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In any setting where users of different ages get different
experiences, some users in age group &lt;em&gt;A&lt;/em&gt; will want to instead have the
experience of age group &lt;em&gt;B&lt;/em&gt;. How motivated they will be likely depends
on how different those experiences are. Obviously, the level of
motivation will also depend on users, but there is &lt;a href=&quot;https://www.ofcom.org.uk/siteassets/resources/documents/research-and-data/online-research/keeping-children-safe-online/childrens-online-user-ages/children-user-ages-chart-pack.pdf?v=328540&quot;&gt;plenty
of&lt;/a&gt;
&lt;a href=&quot;https://www.crikey.com.au/2025/10/10/teen-social-media-ban-workarounds/&quot;&gt;evidence&lt;/a&gt; that
users will misstate their age in order to access social networking
sites and we ought to assume the same is true for access to
adult content and potentially for the ability to engage in
&amp;quot;financial transactions&amp;quot; with sites. If the mechanisms for
establishing age are readily circumvented, then they will
not be effective in these cases.&lt;/p&gt;
&lt;p&gt;The majority of these mandates require the operating system to conduct
some form of age assurance which typically is interpreted to mean
something beyond bare declaration.  For instance, the NY bill would
require &amp;quot;commercially reasonable age assurance&amp;quot;, which the NY Office
of the Attorney General&#39;s &lt;a href=&quot;https://ag.ny.gov/sites/default/files/regulatory-documents/safe-for-kids-act-nprm.pdf&quot;&gt;Notice of Proposed Rule Making for the SAFE
For Kids
Act&lt;/a&gt;
interpreted as requiring a minimum level of resistance to
circumvention, so we&#39;re talking about mechanisms like facial age
estimation or requiring the user to show government issued ID. This
topic is covered extensively in our KGI report, so I won&#39;t go
over it in more detail here, but a number of these mandates don&#39;t require
age assurance but just require that the user indicate their age (self-declaration)
at account creation.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The general assumption in age assurance discussions is that
self-declaration is insecure because the user can just lie about their
age, and most existing age assurance mandates forbid it. However,
those mandates are generally expected to be enforced on the Web server
and the situation is somewhat different when age assurance is
conducted on a device: if a parent purchases the device for their
child and sets it up for them, they can set up the account with
the child&#39;s correct age.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
Of course, if a child buys the device themselves, this won&#39;t work,
and you need a real age assurance mechanism to prevent circumvention.&lt;/p&gt;
&lt;p&gt;In order for this kind of mechanism to be effective, however, it needs
to be &amp;quot;sticky&amp;quot; so that the minor can&#39;t reset the age setting and enter
a new (false) age, for instance by resetting the device to a factory
configuration and creating a new account. This is a standard feature
of basically every consumer computing device&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;, but it needs to be disabled once the
device has been configured in &amp;quot;child mode&amp;quot;, otherwise circumvention
is trivial. This is already a feature of some existing parental control
modes for mobile devices (e.g., &lt;a href=&quot;https://support.apple.com/en-us/108931&quot;&gt;Apple Screen Time&lt;/a&gt;),
but as far as I can tell none of these mandates require that manufacturers
enable it when the user&#39;s entered age is under 18.&lt;/p&gt;
&lt;p&gt;Moreover, desktop devices typically are not designed to prevent
reinstallation; even those devices which have BIOS locking to prevent
the installation of unauthorized operating systems are not designed
to prevent the installation of a fresh copy of the operating system,
thus reinitiating the date of birth entry. In principle, the OS vendor
could perhaps store the entered DOB somewhere and reset it when the machine
is reinstalled, but this isn&#39;t something any of these mandates require it
to do and would have real technical challenges even on devices running
commercial operating systems—e.g., MacOS or Windows—which
have some sort of remote management capability; it&#39;s largely impractical
on open source operating systems like Linux that don&#39;t centrally manage
devices at all.&lt;/p&gt;
&lt;h3 id=&quot;alternative-approaches-8&quot;&gt;Alternative Approaches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#alternative-approaches-8&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The problem here is created by the combination of (1) self-declaration
and (2) the ability for minors to reset the device themselves. If real
age assurance is required, reset isn&#39;t really an issue because the
user will just be prompted for age assurance after reset. If reset
isn&#39;t possible, then adults can set up the device with the child&#39;s age
and the minor user won&#39;t be able to change it.&lt;/p&gt;
&lt;p&gt;As a practical matter, this means that this kind of requirement likely
won&#39;t be effective on desktop devices, but on mobile devices it can
be made to work when paired with establishing some kind of passcode
that is required to reset the device. This is conceptually somewhat
like existing parental controls systems in that it depends on the
parents to set up the device in child mode, though if they want to
set it up without controls, it requires them to explicitly misrepresent
the child&#39;s age rather than just decide not to enable parental controls.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic challenge here is that the high level desires embodied in
this kind of legislation are very challenging to translate into
technical requirements. This is especially true for device-based
restrictions because the legislation is requiring the creation
of new technology which doesn&#39;t exist yet; by contrast, while there
are many laws requiring server-based age assurance, those laws
mostly require the use of age assurance technologies which are
already in wide use, and so it&#39;s reasonably well understood how to
make them work.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;With device-based mechanisms, legislation is effectively writing the
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Product_requirements_document&amp;amp;oldid=1307471706&quot;&gt;product requirements
document (PRD)&lt;/a&gt;
for a new product, and as anyone who has worked in technology knows,
those PRDs rarely survive contact with engineering reality, especially
when they are written without extensive back and forth with the engineers
who know what is and is not feasible. This is not to say that it&#39;s
not possible to build systems that do what some of these mandates
are trying to accomplish—for instance, getting the big
mobile app stores to require parental consent for software download
for minor users—but getting there without also creating
requirements that are not practicable requires a deep understanding of
the technology platforms being regulated and crafting language
that is compatible with those engineering realities.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;m not going to be rigorous about distinguishing between
legislation that has been enacted (which might or might not
yet be in effect) and that which is proposed. The points
I&#39;m trying to make don&#39;t depend on that. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
On the theory that they are providing an operating system. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I want to recognize here that a lot of source people—including
me—are uncomfortable with this kind of functionality living in
an open source operating system at all, but that&#39;s a distinct
question from whether this kind of approach is workable. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though I guess you might decide to stop them from entering the
number &lt;code&gt;80085&lt;/code&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Which may actually be both an app &lt;em&gt;and&lt;/em&gt; an app store, because
it comes with the &lt;code&gt;npm&lt;/code&gt; package management system. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are, of course &amp;quot;Web APIs&amp;quot; which are provided by servers and
initiated by clients, but that&#39;s usually not used by browsers
but rather by other kinds of agents, and it&#39;s not clear how such
an API would work, as you&#39;d somehow need to define it and have
every site implement it; the header would be the much more
conventional design. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
On iOS in the US Apple requires third party browsers to use WebKit,
so they could ensure it there, but that&#39;s not true on other operating
systems, including Android. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This kind of edge effect is a common problem for privacy
mechanism which rely on deterministic buckets with hard
threshholds. &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6772#section-13.2&quot;&gt;RFC6772&lt;/a&gt;
has a good discussion of this problem in the case of location. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that California&#39;s AB1043 uses the phrase &amp;quot;downloaded and launched&amp;quot;,
which you might infer would require services to query every time
the app is launched, but probably really is intended to mean
the first time.
 &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, in both cases it would be good if the API
enforced the frequency restrictions and potentially
intermediated any exceptional requests for age ranges
to ensure that the user really consented. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Illinois SB3977 nominally required &amp;quot;age verification&amp;quot;
but then the actual requirement of the act is just to
provide an accessible interface at account setup
that requires an account holder to indicate the birth
date, age, or both, of the user of that device for purposes&amp;quot;,
so really it&#39;s just self-declaration. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Of course, in practice we know that parents frequently assist
their children in &lt;a href=&quot;https://www.crikey.com.au/2025/10/20/1-in-3-parents-will-help-kids-get-around-teen-social-media-ban/&quot;&gt;evading&lt;/a&gt;
age assurance. How you feel about this will depend on whether
you think decisions about what content and experiences
minors can access should be up to parents or
the government. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Otherwise, how do you
recover when it gets stuck? &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
And even then, actually writing the mandates correctly can be
very difficult, as exemplified by KGI&#39;s &lt;a href=&quot;https://kgi.georgetown.edu/research-and-commentary/first-steps-toward-operationalizing-age-assurance-mandates-new-york-safe-for-kids-act-proposed-rules/&quot;&gt;comments&lt;/a&gt;
on the NY SAFE for Kids Act Proposed Rules. &lt;a href=&quot;https://educatedguesswork.org/posts/device-based-age-assurance/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Let&#39;s build a tool-using agent</title>
		<link href="https://educatedguesswork.org/posts/tool-calling/"/>
		<updated>2026-03-06T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/tool-calling/</id>
		<content type="html">&lt;p&gt;At this point, if you haven&#39;t heard about &amp;quot;agentic AI&amp;quot;, you
haven&#39;t just been living under a rock but under a huge
pile of rocks. However, even you have heard of agentic AI,
you may also have only some
idea of what it actually means. If so, you&#39;ve come to
the right place. In this post we&#39;re going to build a simple
tool-using &lt;a href=&quot;https://github.com/ekr/tool-calling-demo&quot;&gt;AI agent&lt;/a&gt;
and try to get some sense of what it&#39;s
actually doing.&lt;/p&gt;
&lt;p&gt;Here&#39;s a typical &lt;a href=&quot;https://www.ibm.com/think/topics/agentic-ai&quot;&gt;definition&lt;/a&gt;
of agentic AI, from IBM:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Agentic AI builds on generative AI (gen AI) techniques by using
large language models (LLMs) to function in dynamic
environments. While generative models focus on creating content
based on learned patterns, agentic AI extends this capability by
applying generative outputs toward specific goals. A generative AI
model like OpenAI’s ChatGPT might produce text, images or code, but
an agentic AI system can use that generated content to complete
complex tasks autonomously by calling external tools. Agents can,
for example, not only tell you the best time to climb Mt. Everest
given your work schedule, it can also book you a flight and a
hotel.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In other words, agentic AI doesn&#39;t just talk to you but
can have side effects in the real world. For instance,
you might ask it to book travel for you, do some web
searching, send emails, etc. The interesting
question here is how.&lt;/p&gt;
&lt;p&gt;The first thing to understand is that a &lt;em&gt;large language model (LLM)&lt;/em&gt;
is what it says on the tin: a &lt;em&gt;language model&lt;/em&gt;, which means that
it operates at the level of text. At a high level, an LLM takes
in a string of text (the prompt) and then emits some other text
(the response). It&#39;s common to talk about this as an autocomplete
or predictive system where the LLM emits the most likely text to
come after the prompt, but for our purposes, it doesn&#39;t matter:
the important thing is that the LLM just manipulates text. Everything
we&#39;re going to do in this post is downstream of that fact.&lt;/p&gt;
&lt;h1 id=&quot;preliminaries&quot;&gt;Preliminaries &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#preliminaries&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;AI models can either be local (running on your machine or a machine
you control) or hosted (running on infrastructure operated by the
model provider). In general, the hosted models are a lot more capable
and require a lot of compute power, but you can still get fairly
far with a local model if you just want to do simple stuff.
In a local model, you can provide input to the model
directly, but for a hosted model you need to use some interface
provided by the model provider. For interactive use, this is often
some kind of chat interface, such as &lt;a href=&quot;https://chatgpt.com/&quot;&gt;ChatGPT&lt;/a&gt;,
but for programmatic use model providers give you some kind of
&lt;a href=&quot;https://ai.google.dev/gemini-api/docs&quot;&gt;HTTP&lt;/a&gt;
&lt;a href=&quot;https://platform.openai.com/docs/api-reference/introduction&quot;&gt;API&lt;/a&gt;.
These APIs are conceptually similar but subtly different, so you
need to write your app slightly differently for each platform.&lt;/p&gt;
&lt;p&gt;Although you can access local models directly as a practical
matter what&#39;s convenient is to use something like &lt;a href=&quot;https://ollama.com/&quot;&gt;Ollama&lt;/a&gt;,
which is an engine that allows you to run a large number of models—and
actually knows how to automatically download&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
them—and also provides a common HTTP API which you can use
just as you would a model provider&#39;s API. We&#39;ll be writing our
examples using Ollama so that they can work with local models—thus
allowing us to do some internal instrumentation—but Ollama
can also bridge to the APIs for big model providers, so we can
use the same code with Gemini, Claude, etc, allowing us to demonstrate
things with better models.&lt;/p&gt;
&lt;p&gt;In practice, you would usually not talk to the HTTP API directly, but
instead download some local library (e.g., &lt;a href=&quot;https://github.com/ollama/ollama-js&quot;&gt;ollama-js&lt;/a&gt;)
which takes care of the HTTP API mechanics. In this case, however,
I want to be able to show what&#39;s actually happening, so we&#39;re
going to be writing to the API directly, using the built-in
&lt;a href=&quot;https://nodejs.org/en/download&quot;&gt;nodejs&lt;/a&gt; &lt;a href=&quot;https://nodejs.org/en/learn/getting-started/fetch&quot;&gt;fetch&lt;/a&gt; API. To do this, we&#39;re going to have a trivial
JS API client, shown below.&lt;/p&gt;
&lt;figure&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; fetch &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;node-fetch&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; server_url &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://localhost:11434&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; g_chat_url &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;server_url&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;/api/chat&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; g_model &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;AGENT_MODEL&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;mistral-small&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ChatApi&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  endpoint_url &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; g_chat_url&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  model &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; g_model&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  tools &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; verbose &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;VERBOSE&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;complete&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;messages&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; body &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token literal-property property&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; model&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token literal-property property&quot;&gt;stream&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      tools&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      messages&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;verbose&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;REQUEST:&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;body&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; response &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fetch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;endpoint_url&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token literal-property property&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;POST&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token literal-property property&quot;&gt;body&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;body&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; json &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; response&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;verbose&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;RESPONSE:&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;json&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; json&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;message&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    complete&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;figcaption&gt;
Trivial HTTP API client
&lt;figcaption&gt;
&lt;/figcaption&gt;&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h1 id=&quot;a-simple-chatbot&quot;&gt;A Simple Chatbot &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#a-simple-chatbot&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;We&#39;re going to warm up by build a simple chatbot, which is comparatively
trivial &lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
given this kind of model. We just need to connecting it up some
interface that reads text from the user and sends it to the model,
as shown in the diagram below.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tool-calling-chatbot.png&quot; alt=&quot;Simple chatbot architecture&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
A simple chatbot architecture
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We can then write a trivial chatbot like so.&lt;/p&gt;
&lt;figure&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; ChatApi &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;./api.js&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; Chat &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;./chat-framework.js&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; api &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ChatApi&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; api&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;complete&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;role&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;content&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; line &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;content&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;Chat&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;handler&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;figcaption&gt;
Trivial chatbot code
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Note that this code makes use of a chat &lt;a href=&quot;https://github.com/ekr/tool-calling-demo/blob/main/chat-framework.js&quot;&gt;framework&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
that just loops around input and passes the results to the handler
function shown here. This lets us handle stuff like reading
from the terminal and/or opening the input all in one place so
you can focus on the main code.&lt;/p&gt;
&lt;p&gt;Anyway, here&#39;s an example interaction.&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;&gt;&gt;&gt; Hello&lt;br /&gt;Hello! How can I assist you today?&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So far so good, but now try to have a conversation. For example:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;&gt;&gt;&gt; My shirt is blue&lt;br /&gt;That&#39;s a nice color! Do you need help with something related to your shirt, or would you like to talk about something else?&lt;br /&gt;&gt;&gt;&gt; What color is my shirt?&lt;br /&gt;I don&#39;t have the ability to see or know what you&#39;re wearing right now. Could you please provide more context or clarify your question?&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;WTF? I just told you the color of my shirt.&lt;/p&gt;
&lt;p&gt;What&#39;s going on is that the model itself
is stateless: it just takes in a string of input and produces
output, so as far as the model is concerned when I asked about my
shirt color, this is the first thing I said.
If we want to have a conversation,
I actually need to play back the entire conversation
with each request to the API. We do this by keeping a &lt;code&gt;context&lt;/code&gt; variable which
is just the list of all the things that we&#39;ve said to the
LLM as well as the things it said back to us, as seen in:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; ChatApi &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;./api.js&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; Chat &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;./chat-framework.js&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; context &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; api &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ChatApi&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;role&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;content&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; line &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; api&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;complete&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;context&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;result&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;content&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;Chat&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;handler&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You&#39;ll notice that each entry in the context contains
a &lt;code&gt;role&lt;/code&gt; parameter, which helps the model keep straight
who said what. Now when we run this, we get the right answer.&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;&gt;&gt;&gt; My shirt is blue&lt;br /&gt;That&#39;s a nice color! Do you need help with something related to your shirt, or would you like to talk about something else?&lt;br /&gt;&gt;&gt;&gt; What color is my shirt?&lt;br /&gt;You told me that your shirt is blue. Is there anything specific you would like to know or do regarding your shirt?&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Congratulations, we now have a primitive but functional
chatbot.&lt;/p&gt;
&lt;h3 id=&quot;tool-calling&quot;&gt;Tool calling &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#tool-calling&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;What we&#39;ve built so far is just a brain in a vat: we can feed it text
and it responds with other text, but it can&#39;t do anything that has
side effects. That&#39;s
all great, but the examples we gave above (booking travel, etc.)
require the ability to do things out in the world—or at
least on the Internet—so we need to enable that somehow.
The way this is done is by giving the LLM something called
a &amp;quot;tool&amp;quot;. LLM tools are kind of like API calls in traditional
programming languages: they are functions that let the LLM
do something.&lt;/p&gt;
&lt;p&gt;Using tools with an LLM is conceptually simple:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The wrapper code tells the LLM about the tool by adding the tool
definition to the context.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The LLM invokes the tool by putting tool-specific instructions in
the output.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The wrapper code detects the tool-specific instructions in the
output and invokes the tool.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The wrapper code takes the tool result and passes it to the
LLM as part of the context.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id=&quot;tool-definitions&quot;&gt;Tool Definitions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#tool-definitions&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The tool definition is basically just a JSON expression, like
so:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  name&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;read_temperature&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  parameters&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    type&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;object&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    properties&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      location&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        type&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        description&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;The room name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    required&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  description&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Return the room temperature in degrees Celsius&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This should be reasonably self-explanatory, but just in case:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;name&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;The name of the tool itself (in this case &lt;code&gt;print_message&lt;/code&gt;).&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;description&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;A text description of the tool&#39;s behavior&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;parameters&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;The arguments to the tool&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Think about the tool definition as API documentation for the
LLM: it tells it about the tool, what it does, and how to
call it. The &lt;code&gt;description&lt;/code&gt; field is just freeform text
which is assimilated by the model. The easiest way to think
about this is that the LLM reads the documentation just like
a programmer would and then picks the right tool(s) for the
job based on your instructions.&lt;/p&gt;
&lt;h4 id=&quot;model-calls-tool&quot;&gt;Model Calls Tool &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#model-calls-tool&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In order to call the tool, the model provides
a response that has the information about the tool to
call text. For instance:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;call_5wrxuo5r&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;function&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;index&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;read_temperature&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;arguments&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;living room&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This says exactly what you think it says, namely &amp;quot;I would like to call the
tool &lt;code&gt;get_temperature&lt;/code&gt; with the &lt;code&gt;location&lt;/code&gt; argument being
&lt;code&gt;living_room&lt;/code&gt;  But of course,
this doesn&#39;t have any effect on its own; it&#39;s just some text the model
spits out that is asking the agent wrapper to call the tool.&lt;/p&gt;
&lt;h4 id=&quot;tool-execution&quot;&gt;Tool Execution &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#tool-execution&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The way that the tool actually gets called is that the wrapper
code detects that the model&#39;s output is actually a tool call and
calls the tool rather than printing the output (or whatever it
would ordinarily do with it). In other words, you need to update
the agent wrapper code to be something like this:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; ChatApi &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;./api.js&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; Chat &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;./chat-framework.js&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; Tools &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;./tools.js&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; context &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;call_tool&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;call&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Tools&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;implementations&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;call&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;+++ Calling tool &#39;&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;call&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39; with arguments &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;call&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;arguments&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Tools&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;implementations&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;call&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;call&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;arguments&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;--&gt; &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;result&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;throw&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;Missing tool &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;call&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; api &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ChatApi&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;tools&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; Tools&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;definitions &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;role&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;content&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; line &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; response &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    response &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; api&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;complete&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;context&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;response&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;tool_calls&lt;span class=&quot;token operator&quot;&gt;?.&lt;/span&gt;length&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;response&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;tool_calls&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; tool_result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;call_tool&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;response&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;tool_calls&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;function&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token literal-property property&quot;&gt;role&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;tool&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token literal-property property&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; response&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;tool_calls&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token literal-property property&quot;&gt;content&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; tool_result&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;response&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; response&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;content&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;Chat&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;handler&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code is comparatively simple, but let&#39;s work through
it in pieces. Whenever we send a request to the model API,
we can get one of two responses:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The response can contain a text response (the &lt;code&gt;messages&lt;/code&gt; field
is populated.)&lt;/li&gt;
&lt;li&gt;The response can contain a tool call (the &lt;code&gt;tool_calls&lt;/code&gt; field is
populated.)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the first case, we just display the response to the user
and then read the user&#39;s next input, just as with our original
chatbot code.&lt;/p&gt;
&lt;p&gt;What&#39;s new here is the tool call request. In this case, we don&#39;t want
to display the result to the user, but instead intercept the response
and call the appropriate tool. Handily, the tools are named, so
we can just look up the appropriate implementation by name and
call it. Once we have the response, we can add it to the context
and call the completion API again. Here&#39;s a simple exchange:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;User&gt; Use the available tools to find out the current temperature.&lt;br /&gt;+++ Calling tool &#39;read_temperature&#39; with arguments {&quot;location&quot;:&quot;living room&quot;}&lt;br /&gt;--&gt; &quot;25&quot;&lt;br /&gt;Agent&gt; The current temperature is 25°C.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you look closely, you&#39;ll notice something interesting. I never
told the model which room I wanted it to get the temperature
for; it just hallucinated &amp;quot;living room&amp;quot;. This behavior is
actually nondeterministic and model dependent (I&#39;m using (&lt;a href=&quot;https://ollama.com/library/mistral-small&quot;&gt;mistral-small&lt;/a&gt;.) Some fraction
of the time, the model actually refuses to give me an answer
and instead asks what room I want:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;User&gt; Use the available tools to find out the current temperature.&lt;br /&gt;Agent&gt; Sure, I can help with that. Could you please specify which room&#39;s temperature you would like to know?&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Providing the response works exactly as you&#39;d expect, with
our agent providing the following context:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;  &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;role&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;content&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Use the available tools to find out the current temperature.&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;call_5wrxuo5r&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;function&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token property&quot;&gt;&quot;index&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;read_temperature&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token property&quot;&gt;&quot;arguments&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;living room&quot;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;role&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;tool&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;call_5wrxuo5r&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;content&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;25&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, the context here includes everything that has
happened so far, namely:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;My request for temperature&lt;/li&gt;
&lt;li&gt;The tool call request from the agent&lt;/li&gt;
&lt;li&gt;The response from the tool&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Just as before, the model is stateless, so if we don&#39;t remind
it that it called a tool, it doesn&#39;t have any context for
what the answer &amp;quot;25&amp;quot; is.&lt;/p&gt;
&lt;p&gt;The key point here is that all the tool action happens in the wrapper
code. The LLM has no idea how the tool works; it just knows whatever
the wrapper told it about what each tool does and then whatever the
wrapper says the tool did. And in fact, my implementation of &lt;code&gt;get_temperature&lt;/code&gt;
isn&#39;t attached to a thermometer or some kind of temperature API and
doesn&#39;t have any idea what the temperature is, it&#39;s just returning
the fixed value &lt;code&gt;25&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Notice also that the agent wrapper doesn&#39;t
really know anything about the tools either, it&#39;s just importing
the list of tools from &lt;code&gt;tools.js&lt;/code&gt;, passing the descriptions to
the model and invoking the appropriate tool. All you need to do
to add another tool is add it &lt;code&gt;tools.js&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id=&quot;multi-round-tool-execution&quot;&gt;Multi-Round Tool Execution &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#multi-round-tool-execution&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Our wrapper code will keep looping until the model
returns some response, so this means we can have multiple rounds of
tool execution. So, for instance, we can have a simple thermostat
which turns on the heat if we are below some target temperature.&lt;/p&gt;
&lt;p&gt;To do this, we first need to give the model a new &lt;code&gt;turn_on_heat&lt;/code&gt; tool.&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    name&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;turn_on_heat&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    parameters&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      type&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;object&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      properties&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        location&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          type&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          description&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;The room name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      required&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    description&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Turn on the heat&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is implemented as:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;    &lt;span class=&quot;token function-variable function&quot;&gt;turn_on_heat&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;The heat is now on&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then with the right instructions...&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;User&gt; Use the available tools to find out the current temperature in the living room and use the right tool to turn on the heat if it is under 40 Celsius.&lt;br /&gt;+++ Calling tool &#39;read_temperature&#39; with arguments {&quot;location&quot;:&quot;living room&quot;}&lt;br /&gt;--&gt; &quot;25&quot;&lt;br /&gt;+++ Calling tool &#39;turn_on_heat&#39; with arguments {}&lt;br /&gt;--&gt; &quot;The heat is now on&quot;&lt;br /&gt;Agent&gt; The current temperature in the living room is 25°C, which is below 40°C. I have turned on the heat for you.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You might notice that my instructions here are pretty verbose, and
in particular that I&#39;m telling it to use the tool to turn on
the heat. What&#39;s going on here is that I&#39;m fighting with the model:
mistral-small only has about 20B parameters and so it&#39;s not really
smart enough to figure things out if you&#39;re not super explicit. Here&#39;s
what happened if I didn&#39;t tell it that it had a tool to turn on
the heat:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;User&gt; Use the available tools to find out the current temperature in the living room and turn on the heat if it is under 40 Celsius.&lt;br /&gt;+++ Calling tool &#39;read_temperature&#39; with arguments {&quot;location&quot;:&quot;living room&quot;}&lt;br /&gt;--&gt; &quot;25&quot;&lt;br /&gt;Agent&gt; I don&#39;t have access to real-time data or the ability to control physical devices like thermostats. However, I can guide you through the process of checking the temperature and turning on the heat.&lt;br /&gt;&lt;br /&gt;To check the current temperature in your living room:&lt;br /&gt;1. Use a thermometer placed in the living room.&lt;br /&gt;2. Check the reading on any smart home device that monitors temperature if available.&lt;br /&gt;&lt;br /&gt;If the temperature is below 40°C (104°F), you can turn on the heat by:&lt;br /&gt;1. Using a thermostat to set the desired temperature.&lt;br /&gt;2. Turning on a space heater or other heating device.&lt;br /&gt;3. Adjusting your HVAC system settings if applicable.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the same reason I&#39;m having the &lt;code&gt;turn_on_heat&lt;/code&gt; tool return
that the heat is on. Without that I was seeing the model turn on the
heat and then tell you it didn&#39;t know what to do, like so:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;User&gt; Use the available tools to find out the current temperature in the living room and use the right tool to turn on the heat if it is under 40 Celsius.&lt;br /&gt;+++ Calling tool &#39;read_temperature&#39; with arguments {&quot;location&quot;:&quot;living room&quot;}&lt;br /&gt;--&gt; &quot;25&quot;&lt;br /&gt;+++ Calling tool &#39;turn_on_heat&#39; with arguments {}&lt;br /&gt;--&gt; &quot;&quot;&lt;br /&gt;Agent&gt; I don&#39;t have access to external tools or devices, so I can&#39;t directly check the temperature in your living room or control any heating systems. However, I can guide you on how to do this:&lt;br /&gt;&lt;br /&gt;1. **Check the Temperature:**&lt;br /&gt;   - Use a smart thermostat or a smart home device that can monitor the temperature.&lt;br /&gt;   - Alternatively, use a smartphone app connected to a smart thermostat.&lt;br /&gt;&lt;br /&gt;2. **Turn On the Heat:**&lt;br /&gt;   - If the temperature is below 40°C (104°F), you can use the same smart thermostat or smart home device to turn on the heat.&lt;br /&gt;   - Ensure that your heating system is compatible with smart controls and follow the manufacturer&#39;s instructions for operation.&lt;br /&gt;&lt;br /&gt;If you provide more details about the devices you have, I can give more specific guidance.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These results also aren&#39;t totally reliable (LLMs usually do not have
deterministic behavior) so if you try this yourself you may need to
run the program a couple of times to get the desired result.
You&#39;d probably get a better result if you were using a smarter model,
but I picked something that would run well on low-end machines, because
the next thing I want to do is go a level deeper into what&#39;s actually
going on, and that requires a modal I can run locally.&lt;/p&gt;
&lt;h2 id=&quot;internals&quot;&gt;Internals &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#internals&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said above we&#39;re using the Ollama API rather than a local
library so that you can see the actual data we&#39;re sending to
the API, but that&#39;s just the first layer of the onion because
each LLM has its own idiosyncratic syntax which Ollama translates
to and from.
For example, here are is what our initial shirt
prompt turns into when we send it to Mistral and &lt;a href=&quot;https://ollama.com/library/gemma3&quot;&gt;Gemma&lt;/a&gt; (one of
Google&#39;s open weight models) respectively:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;mistral&quot;&gt;Mistral &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#mistral&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;[SYSTEM_PROMPT]You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris. Your knowledge base was last updated on 2023-10-01. When you&#39;re not sure about some information, you say that you don&#39;t have the information and don&#39;t make up anything. If the user&#39;s question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. &#92;&quot;What are some good restaurants around me?&#92;&quot; =&gt; &#92;&quot;Where are you?&#92;&quot; or &#92;&quot;When is the next flight to Tokyo&#92;&quot; =&gt; &#92;&quot;Where do you travel from?&#92;&quot;)[/SYSTEM_PROMPT][INST]My shirt is blue[/INST]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;gemma&quot;&gt;Gemma &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#gemma&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-xml&quot;&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;start_of_turn&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;user&#92;nMy shirt is blue&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;end_of_turn&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&#92;n&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;start_of_turn&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;model&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is, as they say, a &amp;quot;rich text&amp;quot;. The first thing to notice is that
&lt;strong&gt;neither of these prompts is JSON&lt;/strong&gt;. Instead, Ollama has taken our JSON
API input and translated it into this stuff, which we&#39;ll generously
call &amp;quot;structured&amp;quot;. However, each model has made its own idiosyncratic
choices:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Mistral uses something that kind of looks like XML if you globally
replaced every angle bracket with a square bracket. Gemma uses
XML syntax.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
As we&#39;ll see shortly, OpenAI&#39;s models use something even goofier, with tags like
&lt;code&gt;&amp;lt;|start|&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;|end|&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mistral includes a system prompt that tries to set some basic
ground rules, whereas with Gemma you&#39;re on your
own.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All this is just hidden by Ollama, which has a pretty fancy &lt;a href=&quot;https://github.com/ollama/ollama/blob/main/template/template.go&quot;&gt;templating engine&lt;/a&gt;
that lets each downloadable model specify how to translate
to and from the Ollama API to the model-specific stuff. &lt;!-- TODO : Link--&gt;&lt;/p&gt;
&lt;p&gt;I think the coolest thing here, though, is what&#39;s at the end of
the Gemma prompt, which is basically an incomplete response from
the model&#39;s response, with just the framing ready for the model to
fill it in. What&#39;s going on here? Well, recall that an LLM is
basically a completion machine, and it&#39;s trying to continue the conversation,
so basically we&#39;re telling the model &amp;quot;the next thing that&#39;s going
to happen in this conversation is that the model is going to say
something&amp;quot;. OpenAI&#39;s open models do the same thing. Here&#39;s &lt;a href=&quot;https://ollama.com/library/gpt-oss&quot;&gt;gpt-oss-20b&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;&lt;|start|&gt;system&lt;|message|&gt;You are ChatGPT, a large language model trained by OpenAI.&#92;nKnowledge cutoff: 2024-06&#92;nCurrent date: 2026-03-04&#92;n&#92;nReasoning: medium&#92;n&#92;n# Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;|end|&gt;&lt;|start|&gt;user&lt;|message|&gt;My shirt is blue&lt;|end|&gt;&lt;|start|&gt;assistant&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The responses from Mistral and Gemma are about what you would expect:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;mistral-2&quot;&gt;Mistral &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#mistral-2&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;That&#39;s a nice color! Do you need help with something related to your shirt, or would you like to talk about something else?&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;gemma-2&quot;&gt;Gemma &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#gemma-2&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-xml&quot;&gt;&lt;code class=&quot;language-xml&quot;&gt;That&#39;s cool! Blue is a great color for a shirt. 😊 &#92;n&#92;nIs it a light blue, a dark blue, or somewhere in between? Do you like wearing blue?&#92;n&#92;n&#92;n&#92;n&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;There don&#39;t seem to be any delimiters here, so this could be a bug
in my instrumentation, but I think that&#39;s actually what&#39;s going
on.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Now take a look at what gpt-oss looks like:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;&lt;|channel|&gt;analysis&lt;|message|&gt;We need to respond appropriately. The user says &#92;&quot;My shirt is blue&#92;&quot;. It&#39;s a statement. We can respond with empathy or ask about context. Might be a conversation about shirts, colors, etc. We can ask what they like about blue shirts, or what occasion. Provide a playful or helpful answer. Keep tone friendly. Also check guidelines. There&#39;s no policy violation. Provide short, friendly reply.&#92;n&#92;nWe can ask: &#92;&quot;Cool! Is it a casual or formal shirt? Do you like the shade of blue?&#92;&quot; Let&#39;s produce.&lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;final&lt;|message|&gt;Nice! Blue is a classic choice. Is it a casual tee, a dress shirt, or something else? And which shade do you like—navy, sky blue, or maybe a bright cobalt?&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is really cool, because we&#39;re actually now getting two kinds of output:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;em&gt;content&lt;/em&gt; we asked for (i.e., the model&#39;s response)&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;thinking&lt;/em&gt; behind the answer&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is an important clue to what&#39;s actually going on under the hood.
Reformatting
it to make it clearer:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;  &lt;|channel|&gt;analysis&lt;br /&gt;    &lt;|message|&gt;We need to respond appropriately. The user says &#92;&quot;My&lt;br /&gt;    shirt is blue&#92;&quot;. It&#39;s a statement. We can respond with empathy or&lt;br /&gt;    ask about context. Might be a conversation about shirts, colors,&lt;br /&gt;    etc. We can ask what they like about blue shirts, or what&lt;br /&gt;    occasion. Provide a playful or helpful answer. Keep tone&lt;br /&gt;    friendly. Also check guidelines. There&#39;s no policy&lt;br /&gt;    violation. Provide short, friendly reply.&#92;n&#92;nWe can ask: &#92;&quot;Cool!&lt;br /&gt;    Is it a casual or formal shirt? Do you like the shade of blue?&#92;&quot;&lt;br /&gt;    Let&#39;s produce.&lt;br /&gt;&lt;|end|&gt;&lt;br /&gt;&lt;|start|&gt;&lt;br /&gt;  assistant&lt;br /&gt;  &lt;|channel|&gt;final&lt;br /&gt;    &lt;|message|&gt;Nice! Blue is a classic choice. Is it a casual tee, a&lt;br /&gt;    dress shirt, or something else? And which shade do you like—navy,&lt;br /&gt;    sky blue, or maybe a bright cobalt?&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What we&#39;ve got here is a
&lt;a href=&quot;https://www.cohorte.co/blog/demystifying-reasoning-models-how-ai-learns-to-think-step-by-step&quot;&gt;&amp;quot;reasoning&amp;quot;&lt;/a&gt;
model, and what that means in practice is that it produces its
&amp;quot;thinking&amp;quot; process out loud as part of the model output and after that
thinking is done (&amp;quot;Let&#39;s produce&amp;quot; in the text above) it actually
produces the output that&#39;s intended for the user. What this output
shows, though, is that it&#39;s still all text production—albeit
with a lot of tuning—basically what&#39;s happening the model just
produces the reasoning text first and then produces the output
that follows—in a literal sense!—from that reasoning.&lt;/p&gt;
&lt;h3 id=&quot;tool-calling-2&quot;&gt;Tool-Calling &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#tool-calling-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Now let&#39;s ask see when we call a tool. Here&#39;s Mistral and
gpt-oss after I removed the system prompts (the version of
Gemma I&#39;m using didn&#39;t want to do tool calling, so I didn&#39;t
show it, but you can use &lt;a href=&quot;https://ollama.com/library/functiongemma&quot;&gt;functiongemma&lt;/a&gt;):&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;mistral-3&quot;&gt;Mistral &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#mistral-3&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;[AVAILABLE_TOOLS][{&#92;&quot;type&#92;&quot;:&#92;&quot;function&#92;&quot;,&#92;&quot;function&#92;&quot;:{&#92;&quot;name&#92;&quot;:&#92;&quot;read_temperature&#92;&quot;,&#92;&quot;description&#92;&quot;:&#92;&quot;Return the room temperature in degrees Celsius&#92;&quot;,&#92;&quot;parameters&#92;&quot;:{&#92;&quot;type&#92;&quot;:&#92;&quot;object&#92;&quot;,&#92;&quot;required&#92;&quot;:[&#92;&quot;location&#92;&quot;],&#92;&quot;properties&#92;&quot;:{&#92;&quot;location&#92;&quot;:{&#92;&quot;type&#92;&quot;:&#92;&quot;string&#92;&quot;,&#92;&quot;description&#92;&quot;:&#92;&quot;The room name&#92;&quot;}}}}},{&#92;&quot;type&#92;&quot;:&#92;&quot;function&#92;&quot;,&#92;&quot;function&#92;&quot;:{&#92;&quot;name&#92;&quot;:&#92;&quot;turn_on_heat&#92;&quot;,&#92;&quot;description&#92;&quot;:&#92;&quot;Turn on the heat&#92;&quot;,&#92;&quot;parameters&#92;&quot;:{&#92;&quot;type&#92;&quot;:&#92;&quot;object&#92;&quot;,&#92;&quot;required&#92;&quot;:[&#92;&quot;location&#92;&quot;],&#92;&quot;properties&#92;&quot;:{&#92;&quot;location&#92;&quot;:{&#92;&quot;type&#92;&quot;:&#92;&quot;string&#92;&quot;,&#92;&quot;description&#92;&quot;:&#92;&quot;The room name&#92;&quot;}}}}}][/AVAILABLE_TOOLS][INST]Use the available tools to find out the current temperature.[/INST]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;gpt&quot;&gt;GPT &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#gpt&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;&lt;|end|&gt;&lt;|start|&gt;developer&lt;|message|&gt;# Tools&#92;n&#92;n## functions&#92;n&#92;nnamespace functions {&#92;n&#92;n// Return the room temperature in degrees Celsius&#92;ntype read_temperature = (_: {&#92;n  // The room name&#92;n  location: string,&#92;n}) =&gt; any;&#92;n&#92;n// Turn on the heat&#92;ntype turn_on_heat = (_: {&#92;n  // The room name&#92;n  location: string,&#92;n}) =&gt; any;&#92;n&#92;n} // namespace functions&lt;|end|&gt;&lt;|start|&gt;user&lt;|message|&gt;Use the available tools to find out the current temperature.&lt;|end|&gt;&lt;|start|&gt;assistant&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Holy mixture of formats, batman. In both cases we have quasi-XML with embedded
JSON. With Mistral the the JSON is just inlined into the &lt;code&gt;[AVAILABLE_TOOLS]&lt;/code&gt; block
and with GPT it&#39;s even wackier, and has been turned into some kind of quasi-function notation
and all the quotes stripped (&lt;a href=&quot;https://developers.openai.com/cookbook/articles/openai-harmony/&quot;&gt;harmony&lt;/a&gt; format).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;And finally, here&#39;s the actual tool calls, which are about what you
would expect:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;mistral-4&quot;&gt;Mistral &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#mistral-4&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;[TOOL_CALLS][{&#92;&quot;name&#92;&quot;:&#92;&quot;read_temperature&#92;&quot;,&#92;&quot;arguments&#92;&quot;:{&#92;&quot;location&#92;&quot;: &#92;&quot;living room&#92;&quot;}}]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;gpt-2&quot;&gt;GPT &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#gpt-2&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;&lt;|channel|&gt;analysis&lt;|message|&gt;We need to check: read_temperature returns 25 degrees Celsius. Need to turn on heat if under 40 Celsius. 25 &lt; 40, so we should turn on heat. Use turn_on_heat tool.&lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;commentary to=functions.turn_on_heat &lt;|constrain|&gt;json&lt;|message|&gt;{&#92;&quot;location&#92;&quot;:&#92;&quot;living room&#92;&quot;}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Don&#39;t ask me why the GPT tool calls are in the commentary channel. That&#39;s just how
things are.&lt;/p&gt;
&lt;h2 id=&quot;model-context-protocol&quot;&gt;Model Context Protocol &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#model-context-protocol&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Tools aren&#39;t the only way that an AI model can interact with the
outside world. For example, Anthropic has developed something called
the &lt;a href=&quot;https://modelcontextprotocol.io/docs/getting-started/intro&quot;&gt;Model Context Protocol
(MCP)&lt;/a&gt;,
which is a way for models to interact with external resources (tools,
data, etc.).&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/mcp-architecture.png&quot; alt=&quot;MCP Architecture&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;MCP Architecture: from &lt;a href=&quot;https://modelcontextprotocol.io/specification/2025-11-25/architecture&quot;&gt;modelcontextprotocol.io&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The arrows connecting the host process and the servers are MCP,
which is a fairly simple &lt;a href=&quot;https://www.jsonrpc.org/specification&quot;&gt;JSON-RPC&lt;/a&gt;
protocol. Like tool calling, MCP is generic in that it specifies
how to talk to external resources but doesn&#39;t specify any details
about the resources themselves. Instead, the servers are
responsible for providing descriptions of the resources, which
can currently be any of:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;strong&gt;prompts&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;e.g., &lt;code&gt;code_review &amp;lt;code&amp;gt;&lt;/code&gt;&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Resources&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;such as access to static files, Git repos, etc.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Tools&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;just like we&#39;ve seen these already&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The client can interrogate the server to learn about each of these
resource, which come packaged in convenient descriptions just
like we saw with tools. For instance, here is an example tool
description from the MCP spec:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;jsonrpc&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2.0&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;result&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;tools&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;get_weather&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token property&quot;&gt;&quot;title&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Weather Information Provider&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token property&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Get current weather information for a location&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token property&quot;&gt;&quot;inputSchema&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;object&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;properties&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token property&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token property&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;City name or zip code&quot;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;required&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token property&quot;&gt;&quot;icons&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token property&quot;&gt;&quot;src&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://example.com/weather-icon.png&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token property&quot;&gt;&quot;mimeType&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;image/png&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token property&quot;&gt;&quot;sizes&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;48x48&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;nextCursor&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;next-page-cursor&quot;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This should look incredibly familiar, because it&#39;s basically
the same thing as you would feed in for a tool description
with Ollama. This is actually the tool description
&lt;a href=&quot;https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling&quot;&gt;format&lt;/a&gt;
that Claude uses, where things are named a little differently
(e.g., &lt;code&gt;inputSchema&lt;/code&gt; instead of &lt;code&gt;properties&lt;/code&gt;), but if you
can read one you can read the other.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
The situation is roughly similar for the descriptions
for prompts and resources.&lt;/p&gt;
&lt;p&gt;It&#39;s important to realize that using server-side resources
via MCP is isomorphic to tool calling. Recall that the model
doesn&#39;t know how the tools are implemented, it just knows
that they exist. This means that if you have an LLM which
knows how to call tools, you can make it do MCP just by
creating a translation layer that exposes the MCP-provided
tools as if they were regular tools, as shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tool-calling-mcp.png&quot; alt=&quot;Tool calling via MCP&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Tool calling via MCP
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In this diagram, we actually have three sources of tools,
namely the two MCP servers and then a local tool. The
agent wrapper just collects all the tools and provides
them to the LLM without distinguishing where they live,
and then is responsible for dispatching the tool
requests to wherever they need to go. You can handle
resource requests the same way, with the resource
just being a specialized kind of tool that reads static
data. Prompts are a little different, and I&#39;m not going
to handle them here.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;server-to-client&quot;&gt;Server to Client &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#server-to-client&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;There is a bit more to MCP than this. In particular,
MCP includes functions to let the server ask the
client to access the LLM on its behalf (e.g., ask
for completion). However, these too don&#39;t require anything
new from the model, but are just implemented in the wrapper
code, which accesses the model on the user&#39;s behalf.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The important thing to realize here is that the LLM
doesn&#39;t need to know anything about MCP at all, because
this part of MCP is just tool calling wearing a different hat.
As
long as it&#39;s set up for tool calling, which has a simple
request/response model, we can do all the translation
to MCP in the deterministic agent wrapper code (which doesn&#39;t
require any model tooling). If someone invented a new version
of MCP with totally different syntax, we wouldn&#39;t need to
change the LLM at all, just update the wrapper.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tool-calling/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The amazing thing is really how much we are doing with how little. Our final agent
program is less than 350 lines, and though we&#39;d obviously need
proper error handling, etc. the functionality here
is actually the core of a real agentic tool. We get
that power by composing a bunch of simple components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;An LLM which is able to generate new text in response to
a prompt.&lt;/li&gt;
&lt;li&gt;A wrapper which is able to iteratively take in input and
then ask the LLM &amp;quot;Given what&#39;s happened so far, what&#39;s next?&amp;quot;&lt;/li&gt;
&lt;li&gt;A bunch of tools which are able to have effects in the
the real world when told to do so by the LLM.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That&#39;s all there is.&lt;/p&gt;
&lt;p&gt;Using it for real work would mostly consist of (1) adding a full suite
of tools and (2) using it with a good model rather than the local ones
we&#39;re using here. (2) is actually quite straightforward with Ollama
actually supports &lt;a href=&quot;https://docs.ollama.com/cloud&quot;&gt;cloud models&lt;/a&gt;,
translating to the cloud APIs instead of to the local model
interfaces. This leaves us with the tools, but the tools
aren&#39;t about AI, they&#39;re just the same kinds of APIs that you&#39;d
write for any programming task.&lt;/p&gt;
&lt;p&gt;What makes all this possible is that while the models are trained
to use tools &lt;em&gt;generically&lt;/em&gt;, they aren&#39;t trained to use any specific
tools. That means that all you have to do to add new capabilities
is to write new tools and tell the model about them. The model can
than work forward from your instructions and the tools it knows about
in order to figure out what tools it needs to call and in what order
so that it can accomplish whatever it is you asked it to do.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This setup gives us two avenues for increasing the power of the
system. First, we can make the model smarter so it&#39;s better at
figuring what to do with the tools it has. We saw that already
above, where we had to remind the model that it had tools available,
but with a better model that wouldn&#39;t be necessary. Second, we
can give the model model more tools to work with. These avenues
are independent but work together: you can make your existing
AI-based system better by adding more tools, but then if you replace your
model with a smarter one, it will instantly get better with the tools it has.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The model itself is just data, consisting of a model topology
and a lot of model weights (i.e., numbers), so you need some
software to execute the model, which is what Ollama is. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s obviously a lot of effort to tune the model to generate
plausible chat, but that&#39;s not our problem right now. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Thanks to Gemini for some help with this. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I had to instrument Ollama to get it to print this out.
 &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note that these strings should be translated directly
into tokens when processed by the model &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;All these extra &lt;code&gt;&#92;n&lt;/code&gt;s are real serial killer stuff. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I didn&#39;t screw up the indentation. Remember what I said earlier about
how the prompt includes the start of the model&#39;s response? Well
that&#39;s the missing part, which goes &lt;code&gt;&amp;lt;|start|&amp;gt;assistant&lt;/code&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is a whole other topic, but if you&#39;re familiar with
how quoting issues can lead to stuff like &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#sql-injection&quot;&gt;SQL injection&lt;/a&gt;, you&#39;re quite likely
huddled up in a ball sobbing by now. There is indeed
an analog to SQL injection in LLMs called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Prompt_injection&amp;amp;oldid=1341171277&quot;&gt;prompt injection&lt;/a&gt; which will most likely be the subject of a future
post. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The official SVG on the site had a little
hovering navigation tool on it, so I had to cut out the SVG and rerender
it in my browser and I was too lazy to get their CSS working. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s actually an interesting question whether you
could feed one style of description to another
kind of model and have it work. Remember that
at the end of the day you&#39;re just passing text
to the LLM, so it&#39;s quite possible the model is
smart enough to figure it out, just as you can,
though you&#39;d probably have to work around the
various layers of structure in the API that
expect properly formatted JSON. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This
is of course basically the same task as software engineering. &lt;a href=&quot;https://educatedguesswork.org/posts/tool-calling/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>I automatically generated minutes for five years of IETF meetings</title>
		<link href="https://educatedguesswork.org/posts/ietf-minutes/"/>
		<updated>2025-12-31T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ietf-minutes/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/auto-minutes-logo.jpg&quot; alt=&quot;Auto Minutes Logo&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Auto Minutes Logo [by Gemini]
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;blockquote&gt;
&lt;p&gt;It is characteristic of all committee discussions and decisions that
every member has a vivid recollection of them and that every member’s
recollection of them differs violently from every other member’s
recollection. Consequently, we accept the convention that the official
decisions are those and only those which have been officially recorded
in the minutes by the officials. —Sir Humphrey Appleby, &lt;a href=&quot;https://youtu.be/85fx0LrSMsE?t=137&quot;&gt;&amp;quot;Yes, Prime Minister&amp;quot;: S2E1&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&quot;motivation&quot;&gt;Motivation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#motivation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Like other &lt;em&gt;standards development organizations (SDOs)&lt;/em&gt;, the IETF
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc2418.html#section-3.1&quot;&gt;requires&lt;/a&gt;
that meetings be minuted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All working group sessions (including those held outside of the IETF
meetings) shall be reported by making minutes available.  These
minutes should include the agenda for the session, an account of the
discussion including any decisions made, and a list of attendees. The
Working Group Chair is responsible for insuring that session minutes
are written and distributed, though the actual task may be performed
by someone designated by the Working Group Chair. The minutes shall
be submitted in printable ASCII text for publication in the IETF
Proceedings, and for posting in the IETF Directories and are to be
sent to: &lt;a href=&quot;mailto:minutes@ietf.org&quot;&gt;minutes@ietf.org&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The IETF doesn&#39;t have any professional staff support for taking
minutes, which means that the working group members have to record the
minutes. The outcome of this is predictably bad:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;strong&gt;The minutes are bad:&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Taking good minutes is hard work and nobody at IETF is really
trained to do it. It&#39;s easy for people to transcribe events
incorrectly, miss important events, etc.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Taking minutes interferes with WG participation.&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Taking minutes makes it hard to participate in the WG, partly
because you&#39;re too busy writing stuff down to think about what to say
and partly because you can&#39;t easily minute yourself, so either someone
has to take over while you participate or you end up with a gap in the
minutes.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;People avoid taking minutes.&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Because minute taking isn&#39;t fun, it&#39;s hard to get volunteers.  The
IETF doesn&#39;t have a system for drafting people to do the job, so
instead what happens is that the WG chairs find themselves begging for
people to step forward and take minutes at the beginning of each WG
meeting (&amp;quot;we can&#39;t start without a minute taker&amp;quot;)
until eventually some poor sucker grudgingly agrees to do it.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h2 id=&quot;a-technical-fix&quot;&gt;A Technical Fix &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#a-technical-fix&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This post is actually two stories in one:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;My attempt to produce a technical fix for the minutes problem.&lt;/li&gt;
&lt;li&gt;Some reflections on using AI as a tool to produce that technical fix.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What connects these two threads is that this project would
have been a lot of work a few years ago, but now it&#39;s basically
trivial.&lt;/p&gt;
&lt;p&gt;Over the past 10+ years, there have been some modest improvements
which have made things a little easier:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s now standard practice to take minutes in a shared notepad,
which makes it possible for multiple people to take minutes, or for
someone to fill in a bit when the main minute taker wants to
participate. Nevertheless, as mentioned above, taking minutes is not
a popular activity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The IETF now makes complete video and audio recordings of every WG
meeting, complete with automated transcripts. Many of the people
I know find the minutes so unreliable, they just go back to the
video whenever they want to know what happened.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&#39;m a long-time minutes-taking evader but I&#39;m also a strong believer
in recognizing when people don&#39;t like doing some things and trying to find
ways to stop them from having to do them. At a recent interim meeting
in Zurich for the AIPREF WG, some of us got so frustrated with the
whole thing that we
&lt;a href=&quot;https://datatracker.ietf.org/doc/draft-rescorla-auto-minutes/&quot;&gt;proposed&lt;/a&gt;
that the IETF dispense with minutes taking entirely and just declare
the automated transcripts to be the minutes. This suggestion was &lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/gendispatch/6K7LEcbYVcaztOcO_4ncVywrlF0/&quot;&gt;not
popular&lt;/a&gt;
and after looking at the transcripts in more detail I have some
sympathy for the objectors, as they&#39;re pretty hard to work with (more
on this below).&lt;/p&gt;
&lt;p&gt;Not totally deterred, I decided to take another run at a technical
fix: maybe the transcripts aren&#39;t good enough, but what if we could
just automatically make minutes from the transcripts? Fortunately,
I had another meeting to attend the next week—and hence
a need for a side project to distract me—and a copy of
&lt;a href=&quot;https://www.claude.com/product/claude-code&quot;&gt;Claude Code&lt;/a&gt;,
and thus
&lt;a href=&quot;https://ietfminutes.org/&quot;&gt;ietfminutes.org&lt;/a&gt; was born.&lt;/p&gt;
&lt;h2 id=&quot;architectural-overview&quot;&gt;Architectural Overview &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#architectural-overview&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The diagram below shows the overall architecture of
&lt;a href=&quot;https://ietfminutes.org/&quot;&gt;ietfminutes.org&lt;/a&gt;.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ietf-minutes-arch.png&quot; alt=&quot;Architecture of IETF Auto Minutes&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Overall architecture
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The basic concept here is unbelievably simple and obvious: take the transcripts
that we&#39;re already generating and ask an LLM to make minutes out of them.
Here&#39;s the &lt;a href=&quot;https://github.com/ekr/auto-minutes/blob/main/src/generator.js#L39&quot;&gt;prompt&lt;/a&gt;,
with some re-flowing.&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;You are an expert technical writer for the IETF. Convert the following meeting transcript&lt;br /&gt;into well-structured meeting minutes in Markdown format. It should contain an account&lt;br /&gt;of the discussion including any decisions made.&lt;br /&gt;&lt;br /&gt;Session: ${sessionName}&lt;br /&gt;&lt;br /&gt;Requirements:&lt;br /&gt;- Start with a # header with the session name&lt;br /&gt;&lt;br /&gt;- Include a ## Key Discussion Points section with bullet points&lt;br /&gt;- Include a ## Decisions and Action Items section if applicable&lt;br /&gt;- Include a ## Next Steps section if applicable&lt;br /&gt;- Be concise but capture all important technical discussions&lt;br /&gt;- Use proper Markdown formatting&lt;br /&gt;- Focus on technical content and decisions&lt;br /&gt;- Remember that IETF participants are individuals, not representatives of&lt;br /&gt;  companies or other entities&lt;br /&gt;- Remember that consensus is not judged in IETF meetings; it is established separately. &lt;br /&gt;  It&#39;s OK to say things like &quot;a poll of the room was taken&quot; or&lt;br /&gt;  &quot;a sense of those present indicates...&quot;&lt;br /&gt;&lt;br /&gt;The transcript is in JSON format with timestamps and text. Here is the transcript:&lt;br /&gt;&lt;br /&gt;${transcript}&lt;br /&gt;&lt;br /&gt;Generate the meeting minutes:`; &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code to talk to the LLM is just a trivial use of the
service API, namely feeding it the prompt with the transcript
filled in and getting back the summary.&lt;/p&gt;
&lt;p&gt;The vast majority of the code is plumbing, specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Retrieving the list of sessions from the IETF &lt;a href=&quot;https://datatracker.ietf.org/&quot;&gt;&amp;quot;datatracker&amp;quot;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Retrieving the actual session transcripts from the
&lt;a href=&quot;https://www.meetecho.com/en/&quot;&gt;Meetecho&lt;/a&gt; conferencing system used by IETF&lt;/li&gt;
&lt;li&gt;Formatting the site and publishing it&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;retrieving-the-session-transcripts&quot;&gt;Retrieving the Session Transcripts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#retrieving-the-session-transcripts&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first step is just to find the relevant sessions. Most IETF
WG meetings happen at the thrice yearly in-person IETF plenary
meeting&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
The IETF uses a homegrown tool called &lt;a href=&quot;https://datatracker.ietf.org/&quot;&gt;datatracker&lt;/a&gt;
to manage IETF drafts, agendas, meeting materials, proceedings, and eventually the
minutes. Datatracker is a somewhat aged—but actively
maintained—Django app. Unfortunately it&#39;s really designed to
be used as a Web site and doesn&#39;t have a complete published API, but rather
just &lt;a href=&quot;https://datatracker.ietf.org/api/&quot;&gt;exposes its object model with tastypie&lt;/a&gt;, so I had to do a bit of reverse engineering.
At the end of the day I ended up using
the proceedings page (e.g., &lt;a href=&quot;https://datatracker.ietf.org/meeting/124/proceedings&quot;&gt;https://datatracker.ietf.org/meeting/124/proceedings&lt;/a&gt;).
Each session (WG or otherwise) is linked to by a link
named &amp;quot;Session Recording&amp;quot;:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ietf-proceedings.png&quot; alt=&quot;IETF proceedings page&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
IETF proceedings page
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The session recording link goes to a &lt;a href=&quot;https://meetecho-player.ietf.org/playout/?session=IETF124-PLENARY-20251105-2230&quot;&gt;custom media player&lt;/a&gt;, which has the video
(actually an embedded YouTube player), which also has an embedded
transcript player. The player has a deterministic URL pattern, such
as &lt;code&gt;https://meetecho-player.ietf.org/playout/?session=IETF124-PLENARY-20251105-2230&lt;/code&gt;.
Each session is defined by a session ID, and then The transcript itself is at a deterministic
location based on the session ID. This gives us a straightforward process:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parse the proceedings page to find all links labeled &amp;quot;Session Recording&amp;quot;&lt;/li&gt;
&lt;li&gt;Extract the session ID from the link to the player&lt;/li&gt;
&lt;li&gt;Construct the link to the transcript and download the transcript from the
URL.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The transcript itself is a JSON file consisting of timestamped
fragments of transcribed speech:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;startTime&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;00:00:00&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;text&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;yesterday I was over-achieving High is bad but I joke you see that Yeah, that&#39;s the point You know overnight was good Overnight was good for  that please So this is the ASDF meeting meeting if you&#39;re here because you like trains, you&#39;ll have to go somewhere else Because the AASDF kid likes trains. I&#39;ll start in one minute, and we&#39;re with, we need a note  taker yet Somebody take notes because I&#39;m speaking Jan, are you taking a Yeah, thank you very much I&#39;m doing the talking and Lorenzo is doing the projecting and don&#39;t ask him questions because he can&#39;t talk talk. He&#39;s allowed to. He just is unable to um and uh so yeah so oh, so we&#39;ve actually changed the agenda already already. So note, well, you saw it yesterday you saw it everywhere else Please be nice to each other This is not the latest slide, but that&#39;s okay um and um yeah please be nice to each other um next item is we are going to continue with a version virtual interims. And And pardon me?&quot;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;startTime&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;00:02:02&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;text&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Yes. Thank you And so we have been doing them at, what time was it, your time, 9 a.m 9 a.m which worked well in in Europe and in in Korea and even in California and I was the odd man out at 3 a.m and i did not make it uh uh there um so the question is is that still a good time for everyone? who wants to be involved? Would you like to propose other times? If not, do we want to have one in December? beginning of December? Yes, no, yes Okay we&#39;re going to pick the first Wednesday in December whatever date that is and we&#39;re going to go with that at that same 9 a.m Eastern European time, yeah, which is I guess 7 a.m .T.C. Is that right? yeah okay we&#39;ll post that up and then I think we&#39;ll post a second one for maybe the second week of January. The first week is usually a toast Any great objections? to that and then we&#39;ll discuss what we&#39;re, where we&#39;re doing from there Anyone remotely want to comment on that? No Okay let&#39;s move on to the first real agenda item which is non-affordance and  click either&quot;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  ...&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;minutes-generation&quot;&gt;Minutes Generation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#minutes-generation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The next step is to send the transcript to the LLM and ask it to
make the minutes. As I mentioned above, this is conceptually simple
but turns out to be somewhat slow (order 10s of seconds per session),
which requires some careful handling. At the end of the day,
I ended up doing two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Caching the generated minutes so that you can run the rest
of the code (generating the site, uploading it, etc.) without
waiting for the LLM. This also makes things easier when
running it in automation (see below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Switching from Claude Sonnet to Gemini Flash, which is a lot faster,
as well as cheaper. It&#39;s still too slow to run in the inner loop
when you&#39;re testing, but made it much faster when I wanted to
backfill all the minutes for the past 5 years or so.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This phase is where all the smarts are, but TBH both models do
a pretty reasonable job (see below for more on this). I did run
into one class of error that was bad enough to be worth doing
something about. For some reason, the output would have Markdown fences,
such as:&lt;/p&gt;
&lt;pre&gt;```&lt;/pre&gt;
&lt;p&gt;of&lt;/p&gt;
&lt;pre&gt;```markdown&lt;/pre&gt;
&lt;p&gt;in the output. This idiom is used to indicate to the Markdown
processor that you want the code rendered as literal code
rather than processed, which isn&#39;t what we want here. I ended
up just using a post-processor to remove anything like this.&lt;/p&gt;
&lt;h3 id=&quot;generating-the-site&quot;&gt;Generating the Site &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#generating-the-site&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I wanted to host this on GitHub pages, which meant I needed a
completely static site. My first cut at this was just putting the
generated MD in the &lt;code&gt;gh-pages&lt;/code&gt; branch and letting GitHub render the
HTML.  This works OK, but you end up stuck with GitHub&#39;s styling
choices, and after fighting for a while with trying to configure
&lt;a href=&quot;https://jekyllrb.com/&quot;&gt;Jekyll&lt;/a&gt; I decided it would be easier to
generate the HTML locally. That way, I could use
&lt;a href=&quot;https://www.11ty.dev/&quot;&gt;11ty&lt;/a&gt;, which I was already familiar with, and I
could test out the HTML generation without having to
push it to GitHub and wait for it to generate the pages.&lt;/p&gt;
&lt;p&gt;I ended up with a three step process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Generate the minutes in markdown using the LLM (see above).&lt;/li&gt;
&lt;li&gt;Starting with the generated minutes, generate the site
markdown, as well as the site index.&lt;/li&gt;
&lt;li&gt;Using 11ty, generate the site HTML.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The nice thing about this structure was that it made it
easy to work incrementally: once I had generated minutes
for a few WG sessions I was able to generate a basic site
and then iterate on the wrapping for each page (headers,
footers, etc.), the styling, etc. without having to
hit the LLM again, which makes things both faster and
cheaper. It also meant that once I had things the
way I wanted I could just generate the minutes for
the rest of the WG sessions of interest and regenerate
the whole site. It also made things easier when I went
to automate everything later.&lt;/p&gt;
&lt;p&gt;Importantly, none of this code runs in the critical path because the
site is entirely static, which convenient for deployment reasons
because it means I can run on totally free infrastructure, but also
for security reasons because there&#39;s basically nothing to compromise.
This is good because despite being a security professional, I
don&#39;t really have that much experience building a secure
dynamic Web site using modern tools like Django or RoR, so I
prefer to design things in as fail-safe fashion as I can.&lt;/p&gt;
&lt;h3 id=&quot;automation&quot;&gt;Automation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#automation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In my first cut of the system, I just ran everything on
my local machine, then committed the site to the GitHub
pages branch and pushed it to GitHub. This is fine for
generating minutes for meetings that happened months
ago, but what you really want is to have the minutes
generated in quasi real time, shortly after the
sessions themselves happen (the transcripts take a while
to generate, so it won&#39;t be in real time).&lt;/p&gt;
&lt;p&gt;There are lots of options here, but given that I was already
publishing with GitHub pages, the easiest approach seemed to be to use
&lt;a href=&quot;https://github.com/features/actions&quot;&gt;GitHub Actions&lt;/a&gt;. There&#39;s no Web
hook for when the minutes are published, but actions supports
&lt;a href=&quot;https://docs.github.com/en/actions/reference/workflows-and-actions/events-that-trigger-workflows#schedule&quot;&gt;scheduled&lt;/a&gt;
events, so I can just poll the site periodically. The tricky part
here is maintaining the cache of generated minutes files, as we
obviously don&#39;t want to have to regenerate all of them every
time a new session transcript is published.&lt;/p&gt;
&lt;p&gt;What I ended up doing was having two repos:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ietf-minutes/ietf-minutes-data&quot;&gt;ietf-minutes-data&lt;/a&gt;
for the GitHub pages site and the cache of generated minutes, which is
stored as a branch.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ekr/auto-minutes/&quot;&gt;auto-minutes&lt;/a&gt; for the code to generate the minutes.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The reason for the two repos is that I need the action to
have write access to the repo so it can re-commit the cache,
but I didn&#39;t want to give it write access to the code itself
in case I screwed something up or there was a security issue.
This way, even if there is a total compromise of the system,
the worst thing that can happen is damage to the data.
I&#39;m not a GitHub actions wizard, so there might be some
other way to do this, but I like to keep things simple.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ietf-minutes-action.png&quot; alt=&quot;GitHub action structure&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
GitHub automation architecture
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/ietf-minutes/ietf-minutes-data/blob/main/.github/workflows/sync.yaml&quot;&gt;action&lt;/a&gt; itself is attached to the &lt;code&gt;ietf-minutes-data&lt;/code&gt; repo.
GitHub doesn&#39;t do a very good job of running the job at
the scheduled time, but it seems to eventually run and
as I mentioned above, the transcripts take a while to
show up, so it&#39;s not that big a deal. Once it does fire,
here&#39;s what happens:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Check out the &lt;code&gt;auto-minutes&lt;/code&gt; repo, which has the actual code
and &lt;code&gt;npm install&lt;/code&gt; all the dependencies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It then checks out the &lt;code&gt;cache&lt;/code&gt; branch of &lt;code&gt;ietf-minutes-data&lt;/code&gt;,
containing all previously AI-generated minutes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run the minutes generator and generate minutes for all
sessions that haven&#39;t been generated yet. This is the
only stage that uses AI.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If there are new minutes in the cache directory—because
there were new sessions—commit them to the cache
branch and push it back to GitHub. If no new minutes were
generated, the script aborts at this point.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If there are new minutes, then we need to regenerate the site
itself, and then deploy it to GitHub pages.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
This is fast because it just means regenerating the metadata pages
(indexes and the like) and running 11ty to generate the HTML.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It took some messing around to get this running because I was doing a
lot of this on a plane (see also &amp;quot;not a GitHub actions wizard&amp;quot;,
supra), but once I had it up and running it all worked smoothly
for the rest of the meeting.&lt;/p&gt;
&lt;h2 id=&quot;tuning-output-quality&quot;&gt;Tuning Output Quality &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#tuning-output-quality&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said above, the output quality is pretty good, but it&#39;s far
from perfect. With lot of examples to work from, you can see some
consistent patterns.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mis-rendering people&#39;s names (e.g., &amp;quot;Martin Thompson&amp;quot; rather than &amp;quot;Martin Thomson&amp;quot;, &amp;quot;Eric Griswold&amp;quot; for &amp;quot;Eric Rescorla&amp;quot;).&lt;/li&gt;
&lt;li&gt;Mis-identifying speakers entirely (e.g., Rich Salz as me).&lt;/li&gt;
&lt;li&gt;Mis-rendering technical terms (e.g., &amp;quot;Quick&amp;quot; for QUIC&amp;quot;. This is actually an interesting one because it gets it right the first time).&lt;/li&gt;
&lt;li&gt;Misstating the result of a discussion, e.g., saying there was consensus if there wasn&#39;t.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For instance, here is a &lt;a href=&quot;https://ietfminutes.org/minutes/ietf124/tls.html&quot;&gt;session I was in&lt;/a&gt;, which has a number of examples of the above.&lt;/p&gt;
&lt;p&gt;The high level challenge here is that the model itself is kind
of a black box; we can of course tweak the prompt, but it&#39;s
difficult to predict if that&#39;s going to have the right effect.
You can obviously try it with a single problematic session
(I even added a mode specifically for that), but even if you
get the right result on that specific session, you don&#39;t know
if it will make things worse on some other session, so it&#39;s hard
to know how aggressive to be. On the one hand, the prompt is already
kind of ad hoc, but you also don&#39;t want to be doing a random walk
through the prompt space. I suspect the pro thing to do is to
actually fine tune a model, but I&#39;m not sure I have enough energy
for that for what is at the end of the day kind of a hack.&lt;/p&gt;
&lt;p&gt;With that said, there are a few obvious things to do that seem
likely to improve quality.&lt;/p&gt;
&lt;h3 id=&quot;generate-the-transcript-ourselves&quot;&gt;Generate the Transcript Ourselves &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#generate-the-transcript-ourselves&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I said above, I&#39;m just using Meetecho&#39;s transcript generation
function. Some brief inspection of the transcripts it is emitting
suggests that there&#39;s a fair amount of room for improvement, and if
there are errors in the transcript, this has the potential to affect
the generated minutes (although in some cases I&#39;ve actually seen the
minutes generation fix errors in the transcript!).&lt;/p&gt;
&lt;p&gt;The obvious thing to do is to instead do the STT ourselves from the
audio recording using a more advanced STT model (e.g., &lt;a href=&quot;https://ai.google.dev/gemini-api/docs/audio&quot;&gt;Gemini Audio
Understanding&lt;/a&gt;). Unfortunately,
for reasons I don&#39;t quite understand, the IETF
&lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/tools-discuss/aVtvBq8woFLPNG3M4ntlrpXU0xk/&quot;&gt;doesn&#39;t&lt;/a&gt;
presently make the audio recordings available. It&#39;s probably possible
to use &lt;a href=&quot;https://ytdl-org.github.io/youtube-dl/index.html&quot;&gt;youtube-dl&lt;/a&gt;
or the like to get the video and then pull out the audio, but that&#39;s
not really the way I would prefer to do things. This is a TODO for
the future when the IETF cracks the code on how to host audio files.&lt;/p&gt;
&lt;h3 id=&quot;identify-the-speaker-explicitly&quot;&gt;Identify the Speaker Explicitly &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#identify-the-speaker-explicitly&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As noted above, a persistent problem is misidentifying speakers, either
by mis-rendering their name or getting the wrong person altogether.
This is kind of irritating because the information is actually in the
system somewhere. The IETF actually has a fairly fancy audio setup, with
at least four different audio inputs (sometimes with multiple microphones
in each).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chair&lt;/li&gt;
&lt;li&gt;Presenter&lt;/li&gt;
&lt;li&gt;Room (for comments and questions)&lt;/li&gt;
&lt;li&gt;Remote&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of this stuff feeds into Meetecho as well as into the room
speakers, but Meetecho&#39;s transcript generation removes which channel
the audio came in on. If the transcript were just annotated with the
input channel, then it would make it a lot easier to figure out
the person who was speaking. It&#39;s actually possible
to do better than that, though, though each case needs special handling.&lt;/p&gt;
&lt;h4 id=&quot;chair&quot;&gt;Chair &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#chair&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Your typical IETF WG has one or two chairs and the IETF datatracker
already records the chairs, so it&#39;s straightforward to reduce the
chair mic down to one or two people. In my experience, the chairs
typically don&#39;t identify themselves, so you&#39;d probably need some
speaker recognition (perhaps augmented by the video stream) to
determine which chair was talking. Still, just knowing that it
was a chair would be helpful.&lt;/p&gt;
&lt;h4 id=&quot;presenter&quot;&gt;Presenter &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#presenter&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Each WG session is likely to have a number of presenters, but
there&#39;s a fair amount of metadata available to determine which
one is which, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The agenda listing who is presenting&lt;/li&gt;
&lt;li&gt;The chairs announcing the next slot&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You&#39;d need to play around with things a bit to determine whether
it was best to try to do something smart or just feed all of the
information into the model and let it figure things out, but
again, just knowing that some audio came from the presenter
mic would help.&lt;/p&gt;
&lt;h4 id=&quot;room&quot;&gt;Room &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#room&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It used to be very difficult to know who was speaking at the room
microphones, but as part of the effort to make participation
remote friendly, IETF now has a unified queue management system
between remote and in-room mics, and so if you want to speak
at the room mic, you need to get in the queue first. This means
that even though Meetecho doesn&#39;t necessarily know who is actually
speaking, it knows who is at the head of the queue and if they
are local or remote, so you could probably just assume that
if it&#39;s the in-room mic, it&#39;s the person at the head of the
mic line. This won&#39;t be perfect, because sometimes people but in
or reorder themselves, but it would be a lot better.&lt;/p&gt;
&lt;h4 id=&quot;remote&quot;&gt;Remote &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#remote&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;This is actually the easiest: you can&#39;t remotely participate in
an IETF meeting without using the remote Meetecho tool, and that
requires registering, so Meetecho knows exactly which individual
is speaking over a given remote channel.&lt;/p&gt;
&lt;h3 id=&quot;ground-the-output-in-existing-ietf-information&quot;&gt;Ground the Output in Existing IETF Information &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#ground-the-output-in-existing-ietf-information&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Finally, there&#39;s a broader opportunity to ground the output in
existing IETF information, and in particular to give the LLM a hint
about some commonly used words and phrases. For example, we
could provide:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The list of people actually in a session so that it
knows that the most likely names are.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The WG agenda so it knows the presentation order
(see above).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The presentations and documents associated with a given
WG session.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A list of common terms and acronyms, so that it knows
that if you talk about &amp;quot;QUIC&amp;quot; it&#39;s probably not &amp;quot;Quick&amp;quot;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&#39;ve been looking at this a bit; maybe for next time.&lt;/p&gt;
&lt;h2 id=&quot;take-homes&quot;&gt;Take Homes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#take-homes&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is a pretty simple project, but it gave me the opportunity
to play with a bunch of AI tools, and while I&#39;m not any kind
of expert, I do think there are some useful lessons.&lt;/p&gt;
&lt;h3 id=&quot;ai-code-generation&quot;&gt;AI Code Generation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#ai-code-generation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The vast majority of this code was written with AI coding tools,
mostly &lt;a href=&quot;https://www.claude.com/product/claude-code&quot;&gt;Claude Code&lt;/a&gt;.
Mostly, I just pointed Claude Code at the APIs and told it
what I wanted and let it rip. I did some light review of the
code to see if it seemed to be doing what I wanted,
but if I&#39;m being honest, it was mostly this:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/lgtm.png&quot; alt=&quot;LGTM&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;I fear this is the future of AI coding assistance, as it&#39;s
very difficult to maintain attention when reviewing big PRs.&lt;/p&gt;
&lt;p&gt;This isn&#39;t ideal, but this particular site is pretty low
stakes and one of the advantages of the split architecture
I&#39;m using is that even if you make some pretty egregious
mistakes in the code, there&#39;s (hopefully!) not too much
that can go wrong beyond having the site itself end up
wrong.&lt;/p&gt;
&lt;p&gt;I know lots of other people have used AI coding tools, but I
wanted to form my own opinion, so FWIW, here are some thoughts.&lt;/p&gt;
&lt;p&gt;First, my impression here is that Claude is really good at
the routine stuff like knowing how to scrape a site, parsing
the HTML, figuring out which parts of the DOM to examine,
etc. It wasn&#39;t as good at figuring out the overall architecture,
but if you ask it to do the job in pieces and correct it when
you&#39;re unhappy, it works better. This matches my experience
using Gemini and herding the AI seems like it&#39;s a skill that
we&#39;re all going to have to learn.&lt;/p&gt;
&lt;p&gt;It&#39;s especially helpful to have a tool like this for doing
routine stuff you&#39;re bad at or don&#39;t want to learn. For example,
I suck at CSS, but I also don&#39;t want to do anything complicated,
so generally if you just tell Claude what you want and don&#39;t
care too much about the exact details of how things look it
does an OK job. there&#39;s still a bunch of stuff where you have
to be like &amp;quot;no, I really want that menu item 20% lower&amp;quot;,
and in some cases you eventually have to get in there and
fix things yourself, but again, having a computer do the busywork
is great.&lt;/p&gt;
&lt;p&gt;The time scale is oddly inhuman: Claude is incredibly fast
at generating a big pile of code, but if you want some
trivial change you still somehow end up with a lot of
think time latency, where you&#39;re just sitting and staring
at the screen waiting for Claude Code to get back to you.
I did a lot of this work sitting in meetings, so I could
just tune out and wait for the model, but if this is all
I was doing, then I&#39;d want to develop some different work
rhythms. I&#39;ve heard of people having a lot of different
outstanding tasks and waiting for them to complete and reviewing
them, but this isn&#39;t something I&#39;ve tried much yet.&lt;/p&gt;
&lt;h3 id=&quot;summarization&quot;&gt;Summarization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#summarization&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This is the first project I&#39;ve worked on where I didn&#39;t just
use AI to generate the code but where AI was actually a core
piece of the operations. It&#39;s a really different experience
from normal engineering because the AI is basically a
nondeterministic black box that you have to kind of
talk into doing what you want. This has a few implications
that take some getting used to.&lt;/p&gt;
&lt;p&gt;First, it&#39;s really hard to know what the impact of any
particular change to the prompt will be. For example,
we&#39;ve been having a sort of persistent problem where
the model would report that there was consensus on a certain
point, even though the chairs hadn&#39;t declared consensus.
&lt;a href=&quot;https://www.mnot.net/&quot;&gt;Mark Nottingham&lt;/a&gt; and I spent
&lt;a href=&quot;https://github.com/ekr/auto-minutes/commit/1d92d10aee45a48a047f522c88cab6989d3a2468&quot;&gt;some&lt;/a&gt;
&lt;a href=&quot;https://github.com/ekr/auto-minutes/commit/bf4945e9aa40fcca7fd2a57a5ae1bfab5f1135eb&quot;&gt;time&lt;/a&gt;
tweaking the prompt trying to get it not to declare consensus
incorrectly, but the results really weren&#39;t quite what
we were hoping.&lt;/p&gt;
&lt;p&gt;Second, because the system is nondeterministic you don&#39;t get exactly
the same results with the same inputs, which makes things hard to
test. It&#39;s of course trivial to regenerate any individual session as a
test but the problem is that just because it works once doesn&#39;t mean
it will work reliably. I&#39;m very curious what other people do in this kind
of situation, but just on first impression this feels a bit like
a statistical process control problem, where we&#39;d have to do
something like A/B tests with a given prompt (or, again, try to
fine tune the model).&lt;/p&gt;
&lt;p&gt;From a product delivery perspective, I also noticed that this is
also a bit confusing for other people, who are used to software behaving
predictably. I&#39;ve gotten a number of bug reports that are basically
of the form &amp;quot;the generated minutes don&#39;t seem quite right&amp;quot;, and
my response is kind of the same: ¯&#92;_(ツ)_/¯. As noted
above, I have a few ideas for how to improve things, but
fundamentally, I don&#39;t know how to fix specific defects.&lt;/p&gt;
&lt;h2 id=&quot;integration-into-the-process&quot;&gt;Integration into the process &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#integration-into-the-process&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even as-is the minutes are reasonable quality—and the bar here is
pretty low—but they definitely can contain errors (as they say
&amp;quot;AI can make mistakes&amp;quot;). The idea here isn&#39;t to replace the minutes
on the IETF proceedings but rather to make the process of generating
them easier. I&#39;m not going to judge you if you just take what I&#39;ve
generated and submit it as the minutes, but a better practice is
to at least give it a once-over to fix any glitches before submitting
them.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The bigger picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This project wouldn&#39;t have happened without modern AI tooling. Even
ignoring that it depends on AI to do the summarization, I&#39;m not sure I
would have gotten around to doing it without being able to use AI to
write most of the code for me.&lt;/p&gt;
&lt;p&gt;There&#39;s nothing particularly complicated here, but there&#39;s a lot of
routine but fiddly work (scraping the site, finding the exact parts of
the DOM to extract, talking to the AI API, tweaking the site look and
feel, etc.)  that takes time. In many cases, these tasks require you
to learn something you don&#39;t already know, isn&#39;t very conceptually
interesting— what are the arguments to this API call?—and
will quickly forget. This is all friction in the coding process that
the AI lets you just skip over, because the assistant can usually
figure it out. I know there are a bunch of debates about how much the
AI is really doing that versus just pattern matching from a lot of
other people&#39;s examples, but as a practical matter, it kind of doesn&#39;t
matter.&lt;/p&gt;
&lt;p&gt;On the other hand, while it&#39;s amazing to get so much done with so
little personal effort, there&#39;s also something a bit unsatisfying
about it, seeing as you didn&#39;t do much of the work yourself, but
instead supervised some AI doing it. This experience may be familiar
to more senior technical people who have moved from doing a lot of
actual software engineering to leading projects, as a tech lead,
architect, or CTO; you do a lot more architecture and steering and a
lot less actual hands-on work. The vague unease that you don&#39;t really
understand what&#39;s going on isn&#39;t new either; of course what&#39;s
different here is that—at least in theory—your co-workers
understood it and you trusted them, whereas here it&#39;s just you and the
machine, at best a &lt;a href=&quot;https://en.wikipedia.org/wiki/P-zombies&quot;&gt;p-zombie&lt;/a&gt;
and at worst a fancy random number generator, but nevertheless
I know a lot of more senior engineering people miss the feeling
that they did it themselves as opposed to leading other people
who did the actual work.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;A huge amount has been written about how to use these tools safely and
efficiently; this is understandable given the general
efficiency-maxxing ethos of technology. Less explored, however, is the
question of how to use them enjoyably. One of the great things about
being a professional software engineer is that programming is &lt;em&gt;fun&lt;/em&gt;,
not just the feeling of building something cool, but the experience of
&lt;a href=&quot;https://en.wikipedia.org/wiki/Flow_(psychology)&quot;&gt;flow&lt;/a&gt;, when the code
just seems to come effortlessly from your brain to the keyboard. At
least for me, the experience of using AI tools like Claude Code is
totally different: you ask the machine to do something, then sit and
wait for it to spit back a response; this is the opposite of flow.
I can&#39;t help but wonder whether some of the resistance to AI in
the software engineering community is about AI taking the fun out of
the experience of programming. I certainly feel some of this, at
the same time as I try to remind myself that what&#39;s really
important is accomplishing stuff, not whether you did it
personally or had fun, and that means using the best
tools you can and having the best people do the work, even if
those people are robots,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
but that doesn&#39;t mean I don&#39;t want to have fun at
the same time.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is actually a plenary meeting at the IETF plenary. I&#39;m
using the term &amp;quot;plenary&amp;quot; here to distinguish from interim
meetings. &lt;a href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Now that I&#39;m writing this, I see that it&#39;s actually a bug because
it means that if I change the templates but there are no new
sessions, things don&#39;t get regenerated. This is a side effect
of the fact that I was originally doing things by hand and
then wrote the automation during the IETF meeting, but after I
had the templates written, so there were always new sessions. &lt;a href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s not uncommon to see very senior people whose job is
really to set the direction for the organization instead
jumping in and trying to write code. Sometimes this is
what&#39;s needed (&amp;quot;all hands on deck&amp;quot;) and but in my experience
it&#39;s far more often about prioritizing your feelings
that you&#39;re doing &amp;quot;real work&amp;quot; over the thing you should
be doing.
 &lt;a href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See also Thomas Ptacek&#39;s &lt;a href=&quot;https://fly.io/blog/youre-all-nuts/&quot;&gt;My AI Skeptic Friends Are All Nuts&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/ietf-minutes/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Using Government IDs for Age Assurance</title>
		<link href="https://educatedguesswork.org/posts/age-verification-id/"/>
		<updated>2025-10-19T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/age-verification-id/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/internet-dog.png&quot; alt=&quot;On the Internet, nobody knows you&#39;re a dog&quot; /&gt;&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;On the Internet, nobody knows you&#39;re a dog. By Gemini, riffing off
the New Yorker &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=On_the_Internet,_nobody_knows_you%27re_a_dog&amp;amp;oldid=1310801937&quot;&gt;original&lt;/a&gt;.&lt;/p&gt;
&lt;/figure&gt;
&lt;/figure&gt;
&lt;p&gt;Over the past few years, an increasing number of jurisdictions have
started to require that service providers of various
kinds (most frequently pornography but also social networking sites)
check the age of their users. Many of these laws and
regulations don&#39;t specify any particular form of age assurance, but
instead simply require it to be &amp;quot;effective&amp;quot;, or in the words of UK&#39;s
&lt;a href=&quot;https://www.ofcom.org.uk/&quot;&gt;OfCom&lt;/a&gt;, &amp;quot;highly effective&amp;quot;. One obvious
way to do this is to use some form of government ID to establish that
you fall within the appropriate age range.&lt;/p&gt;
&lt;h2 id=&quot;government-ids-in-person&quot;&gt;Government IDs in Person &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#government-ids-in-person&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Before we talk about the online context, let&#39;s talk about how
government IDs work in the physical context. Generally,
it&#39;s a piece of plastic
with your photo and some personal information
(name, date of birth, etc.) on the front, and a bar code
on the back, as shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/dl-front-back.png&quot; alt=&quot;Drivers license front and back&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Driver&#39;s license front and back
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;the-us-situation&quot;&gt;The US Situation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#the-us-situation&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Unlike many other countries, the US does not have a national
ID card. Instead, the main form of ID is a driver&#39;s license,
which is issued by states. While A US passport is issued by the federal government but
many Americans &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=United_States_passport&amp;amp;oldid=1304471654&quot;&gt;do not have passports&lt;/a&gt;. Nearly
everyone has a social security card, but these aren&#39;t
usable as a form of authentication because they
don&#39;t have any kind of biometric (not even a picture)
to tie the card to the subject, nor do they have any
meaningful anti-tampering or anti-forgery features.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;National ID cards (in countries that issue them) are
constructed in a similar fashion. The basic way that authentication with an ID card is
that the relying party (e.g., the bartender checking your age)
compares the photo on the front of the card to your face and then
reads the information printed on the card. If they want to know
whether you are over 21, they can just look at your birthday.&lt;/p&gt;
&lt;p&gt;This has a number of obvious security and privacy challenges, starting
with the integrity of the card itself. It&#39;s not at all difficult
to find someone to make you a piece of plastic with your picture
on it—think of the all the employers who issue ID badges—so
plainly just having a plastic card isn&#39;t enough. Real ID cards
have a number of &lt;a href=&quot;https://www.nationalnotary.org/notary-bulletin/blog/2018/09/notary-tip-top-5-security-features-on-ids?srsltid=AfmBOoouAHqZqxT_yDSCj5FLhgHpoUI1wpAZNXF-ghUmfrpKZTwh6cql&quot;&gt;features&lt;/a&gt; designed to prevent forgery or tampering, such as holograms, images
that appear only under UV lights, raised parts of the card, etc. hence why you
see TSA agents shining a UV light on your ID.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The security logic goes like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The card is tamper-resistant and so you can trust
that the picture and information on the card are
what was intended by the issuer.&lt;/li&gt;
&lt;li&gt;The picture matches the person in front of you
and therefore the card is theirs.&lt;/li&gt;
&lt;li&gt;Because the card binds the picture and the
information on the card together, the information
on the card applies to the person in front of you.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Modern driver&#39;s licenses also have a bar code on on the back
that replicates much of the information on the front.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
However, this usually doesn&#39;t have any extra digital security
features (e.g., a digital signature) so for our purposes it&#39;s
just a more robust way of reading the front of the card.&lt;/p&gt;
&lt;h2 id=&quot;remote-authentication-via-id-cards&quot;&gt;Remote authentication via ID cards &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#remote-authentication-via-id-cards&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;An ID card of the type discussed above is an inherently
physical object in that the security guarantees are tied to
the card itself. Despite this, there are many situations in
which people want to authenticate themselves remotely.&lt;/p&gt;
&lt;h3 id=&quot;send-a-scan-or-a-photo-of-the-id&quot;&gt;Send a scan or a photo of the ID &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#send-a-scan-or-a-photo-of-the-id&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The simplest thing to do is just to have the subject scan their ID or
take a photo of it and send the resulting image over e-mail. This is
inherently a very weak form of authentication for several
reasons. First, nothing ties the image to the person who originally
sent it. This means that anyone who can get an image of your license
can impersonate you, including anyone who you send the image to or
anyone who has momentary access to your ID. Think of the number of
times you have to show your ID in a year and realize that each of
those people has an opportunity to take an image of it.&lt;/p&gt;
&lt;p&gt;Second, the
process of photographing or scanning the ID nullifies nearly all of
the physical security features, which means that it&#39;s trivial to make
a fake image which looks sufficiently like a real ID to pass visual
inspection, either starting with a real ID or totally from
scratch. In fact, there are services which will do this for you,
though it&#39;s not that difficult if you have reasonable skills
with an image editing tool like Photoshop.
Despite all this, scanned copies of IDs are surprisingly common.&lt;/p&gt;
&lt;h3 id=&quot;live-presentation&quot;&gt;Live Presentation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#live-presentation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A better practice is to require the subject to do something
to show it&#39;s &lt;strong&gt;their&lt;/strong&gt; ID. There are a number of options here,
including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Having them take a selfie with the ID&lt;/li&gt;
&lt;li&gt;Using their device camera to take a selfie or a self-video&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In general, a selfie
is weaker than a video call because the attacker can just
edit the card image into the selfie. On a video call, the relying
party can require you to turn your head, make different expressions, etc.
as a liveness check (though some of these systems are
&lt;a href=&quot;https://www.youtube.com/shorts/ScAbRQpROaQ&quot;&gt;circumventable&lt;/a&gt;)
as well as show the card from different angles, which facilitates
checking for some of the security features.
This kind of video authentication is in quite wide use, in both
commercial and government contexts. For example, it&#39;s one of
the permitted mechanisms for &lt;a href=&quot;https://www.uscis.gov/i-9-central/remote-examination-of-documents&quot;&gt;employment eligibility verification
in the US&lt;/a&gt;, and is also used for age &lt;a href=&quot;https://www.yoti.com/business/age-verification/&quot;&gt;verification&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The security of these systems varies a fair bit depending on
on the precise technical design. In general, there seems to
be a moderate level of resistance to what&#39;s called a &amp;quot;presentation
attack&amp;quot; in which the user has a fake card, wears a mask, etc.
It&#39;s much harder to defend against what&#39;s called an &amp;quot;injection attack&amp;quot;
in which the attacker controls the camera and can send any
video content they want. While there are some techniques that
try to detect artifacts in the video feed, the main defense
is to have the device remotely attest to the integrity of the
video feed via some &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software&quot;&gt;attestation mechanism&lt;/a&gt;.
However, this does not work &lt;a href=&quot;https://educatedguesswork.org/posts/wei&quot;&gt;on the Web&lt;/a&gt;, which does
not have software attestation mechanisms.&lt;/p&gt;
&lt;p&gt;It&#39;s possible that in the future you might get some leverage
from &lt;a href=&quot;https://c2pa.org/&quot;&gt;C2PA&lt;/a&gt;, especially for still images,
but I think it&#39;s unlikely that it will work for video
because the browser reads in raw video from the camera and then compresses
it for transmission, thus removing any attestation.&lt;/p&gt;
&lt;h2 id=&quot;from-physical-ids-to-digital-ids&quot;&gt;From Physical IDs to Digital IDs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#from-physical-ids-to-digital-ids&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As illustrated by the discussion above, the use of physical ID cards
for remote authentication suffers from two main security problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The binding between the various pieces of information on the
card is weak because it depends on security features
which are designed for in-person verification.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The binding between the subject being identified and the ID card
is weak.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In addition, when used for age assurance, physical IDs have
suboptimal privacy properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;They reveal a lot more information than just whether you
are over the required age.&lt;/li&gt;
&lt;li&gt;They require you to show your face, which you might not
want to do.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are a number of age assurance settings why people
aren&#39;t going to be excited about disclosing their full
names (e.g., to watch porn). Not only do you have to worry
about the relying party disclosing your identity, there
is also the risk of data breaches, as recently happened
with age verification for &lt;a href=&quot;https://news.sky.com/story/discord-hack-shows-dangers-of-online-age-checks-as-internet-policing-hopes-put-to-the-test-13447618&quot;&gt;Discord&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Digital IDs attempt to address all of these problems using
(surprise!) cryptography.&lt;/p&gt;
&lt;h3 id=&quot;digital-signatures&quot;&gt;Digital Signatures &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#digital-signatures&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We already have a way to address the problem of weak binding &lt;em&gt;between&lt;/em&gt;
the elements on the card: we digitally sign the data. Naively, we can
just do what we do all the time for WebPKI certificates: each
authority (i.e., an entity that issues IDs, such as the State of
California) has an asymmetric key pair. When they want to issue an ID,
they just take all the data that would go on the card (much of which
is already on the card &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#bar-codes&quot;&gt;in digital format&lt;/a&gt; anyway) and
digitally sign it with that key.&lt;/p&gt;
&lt;p&gt;A digitally signed credential of this type is essentially a complete
replacement for the physical card: you can encode it in any digital
medium, such as a QR code and show it to the verifier (relying party).
The verifier has some trusted device which can read the data and a
list of the public key pairs it trusts to sign valid credentials. Once
the credential is verified, all the data is trustworthy, and so the
device can display it to the verifier, who then compares the picture
to the subject&#39;s face, just as with a physical ID.&lt;/p&gt;
&lt;p&gt;Unlike a physical ID, a digital credential of this type doesn&#39;t
depend on any kind of physical tamper resistance, because all
the security is cryptographic. This allows it to be encoded in
a QR code and just printed on ordinary paper, and of course
you can print as many copies as you want.
This is a convenient
property in some respects: if you&#39;ve ever lost your driver&#39;s license,
you&#39;ll know it can be a pain to replace—and don&#39;t even get
me started on what it&#39;s like to replace
a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Green_card&amp;amp;oldid=1301427780&quot;&gt;green card&lt;/a&gt;—and
in the meantime you can&#39;t prove your identity at all. Think
how much easier it would be if you could just print out a new
ID on your home printer.&lt;/p&gt;
&lt;p&gt;Unfortunately, having a trivially copyable credential also
presents a security problem because of the weak binding between
the subject and the credential: there are many people who look
like you, so if you can get the ID of one of them, you can
probably use it. Physical credentials make this
attack somewhat hard to mount because they&#39;re hard to duplicate
(assuming the security features are working) so if you have
someone&#39;s ID that means they don&#39;t, meaning you have to
steal or borrow someone else&#39;s ID. By contrast, there can
be an arbitrary number of equally valid copies of a digital
credential, so it&#39;s much more vulnerable to attacks where
one person impersonates an other.&lt;/p&gt;
&lt;h3 id=&quot;credential-binding&quot;&gt;Credential Binding &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#credential-binding&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In order to prevent this kind of attack (technical term: cloning),
real-world digital credentials systems are usually designed so
they can&#39;t be used as a standalone form of identification. Instead,
they are bound to a cryptographic key pair, just like a WebPKI
certificate. When you request a digital credential, you provide
a public key, which is then encoded as part of the credential.
In order to authenticate with the credential, you demonstrate
that you know the corresponding private key. The overall process
looks like this:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/credential-binding.png&quot; alt=&quot;Authentication with a cryptographic credential&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Authentication with a cryptographic credential
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Describing a real protocol is out of scope for this post, but
at a high level, the verifier supplies a random &lt;code&gt;Challenge&lt;/code&gt;
value which the subject&#39;s device signs with its private key,
thus proving that it knows the key. You need the challenge
to prevent replay attacks where the verifier just makes
a copy of the signature to show to some third party; because
each verifier provides its own challenge, the signature isn&#39;t
replayable.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Note that this challenge/response process just proves that
the person trying to authenticate has the right private
key, but not that it&#39;s the right person. For instance, if
I were to steal your phone with your credential on it,
I might be able to impersonate you. In order to ensure
that it&#39;s the right actual person, you &lt;em&gt;also&lt;/em&gt; need to
check the picture against the person&#39;s face.&lt;/p&gt;
&lt;p&gt;The result of this design is that even though the credential
is copyable, the copy isn&#39;t useful if you don&#39;t have the
corresponding private key, so it doesn&#39;t matter&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; if
the credential is public.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
However, it&#39;s still possible to clone credentials if the
subject cooperates.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Obviously, I&#39;m not likely to let arbitrary people impersonate
me, but what if an older sibling wants to let a younger
sibling &amp;quot;borrow their ID&amp;quot; so that they can drink? With
a physical card, this means that the older sibling can&#39;t
authenticate, but with a digital credential they just have
to give them a copy of their private key, which is much
easier.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
In order to prevent this kind of attack, some credentials systems
bind the credential to a specific device.&lt;/p&gt;
&lt;h3 id=&quot;device-binding&quot;&gt;Device Binding &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#device-binding&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Conceptually device binding works the same way as we&#39;ve just
seen, but instead of binding to any private key, the credential
is bound to a key which is stored in a &lt;a href=&quot;https://en.wikipedia.org/wiki/Special:RecentChangesLinked/Secure_element&quot;&gt;secure element&lt;/a&gt;, which is industry
jargon for a tamper-resistant processor that lives inside
your device. The key pair is generated inside the secure
element, which is designed so that it won&#39;t disclose the
private key, though it can be used to sign data. This means
that the user can still authenticate but can&#39;t make a copy
of the key.&lt;/p&gt;
&lt;p&gt;At this point, you might be asking what prevents the user
from &lt;em&gt;claiming&lt;/em&gt; that they generated the key pair inside
a secure element but actually generating it inside a regular
computer and keeping a copy of the private key. The answer is
at credential issuance time the subject has to provide an &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#drm-and-attestation&quot;&gt;attestation&lt;/a&gt; which
shows that the key was generated inside a secure element.
The details of attestation mechanisms are complicated, but
at a high level the secure element will have its own device key
pair which is certified by the hardware manufacturer; it uses
the device key pair to sign the public key for the credential.
The issuer can then verify the signature chain and know
that the private key was generated inside the secure
element.&lt;/p&gt;
&lt;p&gt;It&#39;s important to realize that this attestation mechanism
relies on the issuer&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
trusting the hardware manufacturer (e.g., Apple),
both to manufacture the device so it&#39;s really tamper resistant
and not to certify device key pairs that aren&#39;t associated
with secure elements (as well as not to have their own certification
infrastructure compromised). However, this &lt;em&gt;also&lt;/em&gt; means
that this kind of device bound credential is an inherently
closed system; you can&#39;t just go buying any device you want,
but instead you have to buy one that is trusted, and at
the end of the day the secure element works for the manufacturer,
not for you.&lt;/p&gt;
&lt;h3 id=&quot;selective-disclosure&quot;&gt;Selective Disclosure &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#selective-disclosure&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;An unfortunate property of physical credentials is that they
require disclosing all of the information on the ID. The standard
example here is that when you want to buy alcohol, the clerk
only needs to know you are over 21 (in the US), but when you
show them your driver&#39;s license, they also learn your name,
address, date of birth, etc. There&#39;s no real way around this
because the credential is just a dumb piece of plastic, but
with digital credentials you can do better.&lt;/p&gt;
&lt;p&gt;The standard mechanism here is what&#39;s called &amp;quot;selective disclosure&amp;quot;.
Instead of just signing all the information directly like with
a WebPKI certificate, the issuer instead signs a list of hashes,
with one hash for each attribute, like so:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/selective-disclosure.png&quot; alt=&quot;Signed attributes for selective disclosure&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Signed attributes for selective disclosure
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In order to prove a specific attribute (e.g., date of birth), the subject sends the
verifier three values:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The signed list of hashes.&lt;/li&gt;
&lt;li&gt;The actual value of the attribute plus its corresponding random value.&lt;/li&gt;
&lt;li&gt;The signature over the hash list.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;commitments&quot;&gt;Commitments &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#commitments&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The reason you hash the attributes plus the random values
and not just the attributes themselves is that some
attributes are low entropy (i.e., they only have a small
number of valid values). For example, there are only about
40000 valid birthdates, so an attacker who has the hash of
your birthdate can easily just hash all of them and look
for a matching hash. If you also hash in a secret random
value, then the attacker also needs to try every possible
random value, which is prohibitive if you use a long enough
random value. The technical term for this in cryptography
is a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Commitment_scheme&amp;amp;oldid=1298640450&quot;&gt;commitment&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The verifier checks the signature over the list of hashes and
thus knows that they are valid. It then hashes the attribute value
and random and ensures that it matches the hash on the list,
thus showing that the attribute value is valid as well. However,
it doesn&#39;t learn the actual values for the other attributes, just
their hashes. This means that the subject wants to buy alcohol or
access a pornography site they can reveal just their birthdate and not their
name or address.&lt;/p&gt;
&lt;p&gt;We can actually do better here. In this case the subject is just
trying to prove they are over a certain age, which doesn&#39;t require
knowing their actual date of birth. The way you support this
use case in a selective disclosure scheme is by having a set
of attributes that say whether a user is over or under a certain
age. For example, if the subject is 18, we might have the following
attributes:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Attribute&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&amp;gt;= 16&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&amp;gt;= 17&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&amp;gt;= 18&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&amp;gt;= 19&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&amp;gt;= 20&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&amp;gt;= 21&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;When the verifier asks the subject to prove that they are over a
certain age, the subject can present the appropriate attribute,
which doesn&#39;t tell the verifier anything else.
This technique doesn&#39;t allow you to prove arbitrary
predicates (e.g., &amp;quot;I was born on a Tuesday&amp;quot;) because the
issuer needs to encode each predicate as its own attribute.
However, as a practical matter there aren&#39;t that many predicates
you want to prove on a regular basis.&lt;/p&gt;
&lt;p&gt;Note that there are a few subtle points here. First, the subject
should show the assertion that is closest to the requested
threshold (in this case the smallest assertion) to prevent
leaking more information than needed (if you need to be 18
to use a pornography site, then you don&#39;t want to prove you
are over 21). Second, the verifier can&#39;t be allowed to make
repeated queries for different age threshold, otherwise they can
determine your precise age to within the granularity of the
assertions. For example, the ISO &lt;a href=&quot;https://www.iso.org/standard/69084.html&quot;&gt;spec for mobile IDs&lt;/a&gt; restricts the verifier to asking for two values in order to
support querying for an age range.&lt;/p&gt;
&lt;h4 id=&quot;identity-binding-and-selective-disclosure&quot;&gt;Identity Binding and Selective Disclosure &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#identity-binding-and-selective-disclosure&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As should be apparent at this point, we now have the makings
of a remote age verification system: we issue everyone a digital
credential and then they use it to remotely prove their age
using selective disclosure. Just as with a physical ID, we can remotely authenticate
by providing a photo to the verifier along with a video
or a selfie showing the subject&#39;s face. However, selective
disclosure means we need not do so, because we can &lt;em&gt;just&lt;/em&gt; disclose
the relevant attributes. Of course, in this case, the
only thing binding the credential to the actual user
is the signature from the device bound private key, which
means we&#39;re leaning much harder on the secure element;
if that is compromised and the key is disclosed than
anyone can remotely authenticate, not just someone who
looks like the subject.&lt;/p&gt;
&lt;h4 id=&quot;linkability&quot;&gt;Linkability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#linkability&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;A selective disclosure system improves privacy by preventing
the relying party from learning any information about the
user other than the specific attributes that are disclosed.
However, there are still privacy issues because the credentials
are &lt;em&gt;linkable&lt;/em&gt;. Consider the case where the user uses their
credentials twice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;With a porn site to prove they are over 18.&lt;/li&gt;
&lt;li&gt;At the airport to prove their name matches the ticket.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The table below shows the information disclosed in each case:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Scenario&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Information&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Porn site&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Signed block, age &amp;gt;= 18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Airport&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Signed block, Name&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The problem here should be immediately obvious: the signed
block is the same in both cases, which means that it&#39;s
possible to &lt;em&gt;link&lt;/em&gt; the two transactions. Specifically, this
means that the airport and the porn site can collude to
allow the porn site to learn the user&#39;s name even though
it wasn&#39;t disclosed to them. More generally, relying
parties can collude to determine the union of all
disclosed attributes for a single credential.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;blind-issuance-and-cut-and-choose&quot;&gt;Blind Issuance and Cut-and-Choose &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#blind-issuance-and-cut-and-choose&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;There&#39;s actually an old clever—though inefficient—trick
to prevent linkage by the issuer in selective disclosure systems, due to &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=David_Chaum&amp;amp;oldid=1279910274&quot;&gt;David Chaum&lt;/a&gt;.
It takes advantage of a technique called a &lt;em&gt;blind signature&lt;/em&gt;, which
allows you to digitally sign a message &lt;em&gt;M&lt;/em&gt; without seeing &lt;em&gt;M&lt;/em&gt;. In
the credential issuance setting, the subject would generate the valid
unsigned credential &lt;em&gt;C&lt;/em&gt; and then send the issuer a blinded version &lt;em&gt;Blind(C)&lt;/em&gt;.
The issuer signs this value and returns &lt;em&gt;Sign(Blind(C))&lt;/em&gt; and the
subject then removes the blinding to recover &lt;em&gt;Sign(C)&lt;/em&gt; (which also
has a different signature).&lt;/p&gt;
&lt;p&gt;This leaves us with the problem that the subject might generate
a bogus credential (i.e., with false information) and because
the issuer is signing a blinded object, it can&#39;t tell whether
it&#39;s bogus or not. The trick here is that the subject instead
generates more than one candidate credential, &lt;em&gt;C1, C2, C3... Cn&lt;/em&gt;,
blinds them all, and sends the blinded values to the issuer.
The issuer then picks one at random to sign and asks the
subject to unblind the others. The issuer then checks that the
unblinded credentials have valid values and if so, signs the
remaining blinded one. If any have invalid values, the issuer
refuses to sign, and potentially attempts to punish the
subject.&lt;/p&gt;
&lt;p&gt;By making the number of candidate credentials sufficiently large, the
chance of successfully getting an invalid but signed credential can be
made arbitrarily small. This technique is called &amp;quot;cut-and-choose&amp;quot;
after the famous &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Divide_and_choose&amp;amp;oldid=1294302363&quot;&gt;trick for fair division&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The obvious fix for this problem
is for the issuer to give the user multiple credentials
with the same information and the user uses a separate one
for each transaction; this prevents relying parties from
linking up individual transactions because the signature
blocks will be different.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
It does not, however, prevent the relying party from
colluding with the &lt;em&gt;issuer&lt;/em&gt; to track the user. There
are a number of plausible scenarios in which this could
happen, but perhaps the most concerning in the context
of age verification is that the issuer (in this case
the government) uses some legal process to require the
relying party (the age verification provider or the
porn site) to provide the credentials the user provided
and then links them up locally in order to determine
which specific users visited which sites or (depending
on the design) viewed which content. We&#39;ll see how to
address this issue &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#zero-knowledge-proofs&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;mobile-driver&#39;s-licenses&quot;&gt;Mobile Driver&#39;s Licenses &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#mobile-driver&#39;s-licenses&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Of course, it&#39;s a giant pain to roll out a whole new digital
credential system for age assurance, but the good news is that
we don&#39;t have to. &lt;em&gt;[Corrected -- 2025-10-19]&lt;/em&gt;. This kind of digital credential system is &lt;em&gt;already&lt;/em&gt; being rolled out for
other purposes in a number of jurisdictions, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mobile_driver%27s_license&amp;amp;oldid=1300232064&quot;&gt;mobile drivers licenses (mDLs)&lt;/a&gt;
in several countries and about &lt;a href=&quot;https://www.tsa.gov/digital-id/participating-states&quot;&gt;15 US states&lt;/a&gt;
(10 of which are supported by &lt;a href=&quot;https://learn.wallet.apple/id#states-list&quot;&gt;Apple Wallet&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The upcoming &lt;a href=&quot;https://ec.europa.eu/digital-building-blocks/sites/display/EUDIGITALIDENTITYWALLET/EU+Digital+Identity+Wallet+Home&quot;&gt;EU Digital Wallet&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both of these implement the &lt;a href=&quot;https://www.iso.org/standard/69084.html&quot;&gt;ISO/IEC
18013-5:2021&lt;/a&gt; specification,
which is conceptually similar to the system I&#39;ve described above.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/iso18013-5-model.png&quot; alt=&quot;ISO 18013-5 data model&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
ISO 180135-5 credential data model
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;To orient yourself here to the terminology here, the entire
credential is called an &lt;em&gt;mdoc&lt;/em&gt; and the &lt;em&gt;mobile security object (MSO)&lt;/em&gt; is the
signed object that contains the list of hashes. The
&lt;em&gt;mdoc public key&lt;/em&gt; is the key tied to the device.
Once you have this kind of digital credential you can use it
to prove your age in the same way as we&#39;ve just shown above.&lt;/p&gt;
&lt;p&gt;Bootstrapping off of this kind of digital credentials has two
attractive privacy properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;There are going to be many reasons
to get a digital credential (e.g., to prove your right
to drive, authenticate online, or identify yourself
at the airport). This means a lot of people will have
one anyway and unlike many age
verification systems, the act of getting a digital
credential doesn&#39;t inherently reveal that you want
to engage in some age-restricted activity (e.g.,
watching pornography).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You don&#39;t need to prove your identity at all (nor
reveal your appearance) in order
to prove that you are old enough to access age
restricted content; you just need to prove that
you are over the threshold age.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Unsurprisingly, both Apple and Google have proposed remote
authentication systems based on digital credentials.
These systems are generic and support arbitrary
types of authentication, including age verification.&lt;/p&gt;
&lt;h2 id=&quot;apple-digital-credentials&quot;&gt;Apple Digital Credentials &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#apple-digital-credentials&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Apple&#39;s &lt;a href=&quot;https://developer.apple.com/videos/play/wwdc2025/232/&quot;&gt;proposed
system&lt;/a&gt; is a
fairly straightforward implementation of the selective disclosure
system described above, with the addition of a Web interface, based on the W3C &lt;a href=&quot;https://w3c-fedid.github.io/digital-credentials/&quot;&gt;digital
credentials API&lt;/a&gt;,
thus allowing the user to remotely authenticate to a Web site.
The overall workflow is shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/digital-credentials.png&quot; alt=&quot;Authentication with Digital Credentials&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Authentication with Digital Credentials
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The process starts with the user &lt;a href=&quot;https://support.apple.com/en-us/111803&quot;&gt;loading their mDL into the device&lt;/a&gt;. As part of this process, the user is asked to take
views of their face from multiple angles in order to ensure that they
are the person associated with the ID. This process only has to be
done once.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
The user also has to authenticate via FaceID or TouchID.&lt;/p&gt;
&lt;p&gt;Later, when the user goes to a Web site, that site can use the the
Digital Credentials API to request the desired attributes.  The
browser then queries the device for authentication.  The device
prompts the user about whether they want to reveal the requested
attributes. When the user approves, they have to authenticate again in
order to ensure that it&#39;s the same person as enrolled the device.
Assuming the user consents, the device provides a verifiable response
back to the browser. The browser provides the response back to the
site, which then can verify the response and check the relevant
attributes.&lt;/p&gt;
&lt;h3 id=&quot;user-binding&quot;&gt;User Binding &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#user-binding&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because the &lt;em&gt;device&lt;/em&gt; requires the user to authenticate,
this system provides a measure of binding to the subject even if
the site doesn&#39;t request the user&#39;s photo; only the user who
enrolled the mDL is able to use it to authenticate. Note
that this does not actually ensure that it&#39;s the same
person that is associated with the credential, at least
if the user is authenticating with TouchID, because
nothing ensures that the same person provided their fingerprint
as provided the mDL, so, for instance, person A could
enroll their mDL on person B&#39;s phone. It seems like it ought to be technically
possible for the device to match FaceID against the
mDL, but based on Apple&#39;s description and the fact
that they allow TouchID, I suspect it does not do so.&lt;/p&gt;
&lt;p&gt;As with device binding, the security against user swapping depends
on the security of the device. If the attacker compromises
the device, they can bypass the local biometric check and
authenticate as the subject of the credential whether
they are the same person or not. Moreover, this assumes they aren&#39;t
able to use a pass code, and Apple also appears to allow you to bypass the biometric
checks entirely if you have &lt;a href=&quot;https://support.apple.com/en-us/111803&quot;&gt;accessibility enabled&lt;/a&gt;.
However, in either case the fact that the iPhone hardware is closed
and that Apple attests to its security is an essential feature of this design; if it weren&#39;t
an attacker could extract the device key.&lt;/p&gt;
&lt;h4 id=&quot;privacy&quot;&gt;Privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#privacy&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As discussed above, Apple&#39;s system attempts to preserve privacy
by retrieving batches of credentials, each with its own device
key, thus resisting linkage&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt; via credential reuse.
This still does not prevent the issuer from linking up
transactions, though it requires the relying parties
cooperation (willing or otherwise) to do so.&lt;/p&gt;
&lt;!-- Wallet vs. --&gt;
&lt;p&gt;In addition, Apple doesn&#39;t allow just anyone to request
remote authentication. Apple requires relying parties
to register with &lt;a href=&quot;https://businessconnect.apple.com/&quot;&gt;Apple Business Connect&lt;/a&gt; and
getting a signing certificate that will be used to authenticate
the request for remote authentication.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
As part of this registration, the relying party needs to
document what attributes it will be requesting and why
it needs them. This list will be enforced at authentication
time, so that the relying party can&#39;t ask for extra attributes.&lt;/p&gt;
&lt;h2 id=&quot;zero-knowledge-proofs&quot;&gt;Zero-Knowledge Proofs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#zero-knowledge-proofs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It turns out to be possible to use some fancy cryptography to remove
the linkability problem, by way of something called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Zero-knowledge_proof&amp;amp;oldid=1298731730&quot;&gt;zero-knowledge
proof
(ZKP)&lt;/a&gt;. The
details of how ZKPs work is way outside of the scope of this post, but
the general idea is that you can use cryptography to prove that you
know values with arbitrary properties.&lt;/p&gt;
&lt;h3 id=&quot;proving-program-output&quot;&gt;Proving Program Output &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#proving-program-output&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;For the purposes of this discussion, you should think of a ZKP like this:
The prover and the verifier agree on a program &lt;code&gt;F&lt;/code&gt; (it can even be written
in &lt;a href=&quot;https://risczero.com/&quot;&gt;a conventional programming language&lt;/a&gt;). &lt;code&gt;F&lt;/code&gt; is designed to run on two pieces of input:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &amp;quot;public&amp;quot; input &lt;code&gt;p&lt;/code&gt; known to both the prover and the verifier&lt;/li&gt;
&lt;li&gt;A &amp;quot;secret&amp;quot; input &lt;code&gt;w&lt;/code&gt; known only to the prover&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;F&lt;/code&gt; is designed so that given the inputs &lt;code&gt;p&lt;/code&gt; and &lt;code&gt;w&lt;/code&gt; it outputs either
&lt;code&gt;1&lt;/code&gt;, indicating that &lt;code&gt;p&lt;/code&gt; and &lt;code&gt;w&lt;/code&gt; are valid (&amp;quot;accepting&amp;quot;) or &lt;code&gt;0&lt;/code&gt; (&amp;quot;rejecting&amp;quot;)
indicating that they are not. For example, suppose the prover claims
that they know a message &lt;code&gt;m&lt;/code&gt; such that &lt;code&gt;SHA-256(m) = x&lt;/code&gt;. Then the
program would look something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;function F(p, w) {
  if (SHA256(w) == p) {
    return 1;
  }
  return 0;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The public input (&lt;code&gt;p&lt;/code&gt;) to &lt;code&gt;F&lt;/code&gt; is the hash output &lt;code&gt;x&lt;/code&gt; and the private
input (&lt;code&gt;w&lt;/code&gt;) is the secret message &lt;code&gt;m&lt;/code&gt;. If &lt;code&gt;m&lt;/code&gt; and &lt;code&gt;x&lt;/code&gt; correspond, then
&lt;code&gt;F&lt;/code&gt; returns &lt;code&gt;1&lt;/code&gt;, and otherwise &lt;code&gt;0&lt;/code&gt;. It&#39;s obviously the case that the
prover can run &lt;code&gt;F&lt;/code&gt; and check the output themselves, but the verifier
cannot because they don&#39;t know &lt;code&gt;w&lt;/code&gt;, which is supposed to stay
secret. The point of a zero-knowledge proof is for the verifier to
convince the verifier that they ran &lt;code&gt;F&lt;/code&gt;—or at least that they
could have run &lt;code&gt;F&lt;/code&gt;—with the output &lt;code&gt;1&lt;/code&gt;. In this context, the
&lt;em&gt;proof&lt;/em&gt; &lt;code&gt;P&lt;/code&gt; is some value that the prover sends the verifier that does
that. The verifier then checks &lt;code&gt;P&lt;/code&gt; against &lt;code&gt;F&lt;/code&gt; and &lt;code&gt;p&lt;/code&gt; and if they all
match, then the verifier is convinced that the prover knows &lt;code&gt;w&lt;/code&gt; (in
this case, the message &lt;code&gt;m&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;The way to think about this is that the prover wants to persuade
the verifier that if the verifier &lt;em&gt;were&lt;/em&gt; to run program &lt;code&gt;F&lt;/code&gt; on
&lt;code&gt;w&lt;/code&gt; and &lt;code&gt;p&lt;/code&gt;, they would get the right answer, even though the
verifier didn&#39;t actually run it. So in this case &lt;code&gt;F&lt;/code&gt; checks
that the input value &lt;code&gt;m&lt;/code&gt; matches the hash output &lt;code&gt;x&lt;/code&gt;, but
instead of letting the verifier run &lt;code&gt;F&lt;/code&gt;, we offload the
checking to the prover and the prover then convinces the
verifier that it did the checking correctly.
I know this all sounds like magic and
you&#39;re just going to have to take my word for it—or more to the
point the word of the cryptographers who really understand
it—that it works.&lt;/p&gt;
&lt;p&gt;In order to apply a ZKP system in practice, the prover and the
verifier need to agree to the program &lt;code&gt;F&lt;/code&gt; that the prover is going to
run.  That program can—in principle—do anything, but the
verifier needs to be able to see the program to verify that it
actually does what it is supposed to. Otherwise the prover could say
&amp;quot;I&#39;m running a program which checks the hash&amp;quot; but actually just run
one that always returns 1. Note that part of the proof
is that the prover actually ran &lt;code&gt;F&lt;/code&gt; so they can&#39;t say they are running
&lt;code&gt;F&lt;/code&gt; and actually run &lt;code&gt;F&#39;&lt;/code&gt;, but that doesn&#39;t help if the verifier can&#39;t
actually examine &lt;code&gt;F&lt;/code&gt; and be sure it does the right thing.
Once the program is agreed upon, it gets compiled down into what&#39;s
called an &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Arithmetic_circuit_complexity&amp;amp;oldid=1308908132&quot;&gt;&amp;quot;arithmetic circuit&amp;quot;&lt;/a&gt;,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt;
which is what the ZKP actually proves, so it&#39;s common to talk about
the &amp;quot;circuit&amp;quot; that the ZKP works on, rather than the program, but
they amount to the same thing.&lt;/p&gt;
&lt;h3 id=&quot;applying-zkps-to-digital-credentials&quot;&gt;Applying ZKPs to Digital Credentials &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#applying-zkps-to-digital-credentials&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Assuming we have a ZKP system that allowed us to prove correct
execution of an arbitrary, program, now we have the problem
of how to use that to verify a credential. As a reminder, let&#39;s
go back to the skeleton of the authentication system without
ZKPs, shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/digital-credentials-without-zkp.png&quot; alt=&quot;Digital credentials without ZKP&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Digital credentials without ZKP
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The key point to focus on is the last line, where
the site verifies the response. This is the source of the
privacy problem, because it requires the site to have the
credential. However, all the site really needs to know is
that if it &lt;em&gt;had&lt;/em&gt; verified the response, everything would
have been fine, so we&#39;re going to use the same
trick we just used above, which is
to offload the job of verifying the response to the device,
and instead have the device prove that it verified the
response and everything was fine. This gives us the flow below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/digital-credentials-with-zkp.png&quot; alt=&quot;Digital credentials with ZKP&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Digital credentials with ZKP
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Obviously, this has much better privacy properties because I don&#39;t
actually disclose either the credential &lt;em&gt;C&lt;/em&gt; or &lt;em&gt;K_pub&lt;/em&gt;, so the relying
parties can&#39;t link up multiple authentication transactions, even if
they use the same credential (this means there&#39;s no need to issue new
credentials for each transaction). Similarly, the issuer cannot link
transactions.&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that in order to actually deploy this
system, the site and the device need to agree on the program &lt;code&gt;F&lt;/code&gt;
which needs to do the job of verifying the credentials
and associated signature and checking the disclosed attribute. This is
a nontrivial piece of software and obviously needs to be correct. The
details of how this will work may vary some between designs, but
in general, there needs to be some deterministic way to go from
the set of attributes that the site is interested in into the
the program (circuit) that the device is going to use for the proof.&lt;/p&gt;
&lt;h2 id=&quot;google-wallet-and-zkps&quot;&gt;Google Wallet and ZKPs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#google-wallet-and-zkps&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Google recently &lt;a href=&quot;https://blog.google/products/google-pay/google-wallet-age-identity-verifications/&quot;&gt;announced&lt;/a&gt; that they are going to be supporting
age verification via zero-knowledge proofs, starting with
a partnership with Bumble.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Given many sites and services require age verification, we wanted to develop a system that not only verifies age, but does it in a way that protects your privacy. That&#39;s why we are integrating Zero Knowledge Proof (ZKP) technology into Google Wallet, further ensuring there is no way to link the age back to your identity. This implementation allows us to provide speedy age verification across a wide range of mobile devices, apps and websites that use our Digital Credential API.&lt;/p&gt;
&lt;p&gt;We will use ZKP where appropriate in other Google products and partner with apps like Bumble, which will use digital IDs from Google Wallet to verify user identity and ZKP to verify age. To help foster a safer, more secure environment for everyone, we will also open source our ZKP technology to other wallets and online services.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Unfortunately, this is nearly all the public information we have
from Google on this topic. The only other thing they have
published besides this blog post is a &lt;a href=&quot;https://eprint.iacr.org/2024/2010&quot;&gt;technical paper&lt;/a&gt; and corresponding &lt;a href=&quot;https://github.com/google/longfellow-zk&quot;&gt;implementation&lt;/a&gt; for
a new zero-knowledge proof system called &amp;quot;Longfellow-ZK&amp;quot;.
This seems like interesting work, but it&#39;s only a small
piece of the puzzle. The context here is that we want
to leverage existing digital credential systems, but
unfortunately those existing credentials are often
signed with &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Elliptic_Curve_Digital_Signature_Algorithm&amp;amp;oldid=1301948395&quot;&gt;ECDSA&lt;/a&gt;, and for
technical reasons, many existing ZKP systems struggle
with proving stuff about ECDSA signatures. By contrast,
the Longfellow-ZK system is able to efficiently cover
ECDSA-signed credentials, and the authors show how
to use it to compute proofs over those credentials.&lt;/p&gt;
&lt;p&gt;It&#39;s clear how this is useful, but it&#39;s only a piece of
the puzzle, and we don&#39;t seem to have either a complete
system design or an actual protocol.
What Google has not done—or at least I haven&#39;t seen—is
publish the precise details of how to bind this to the
Digital Credential API. In particular, we don&#39;t have:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The details of every message.&lt;/li&gt;
&lt;li&gt;The exact structure of the circuit or an algorithm to generate
the circuit.&lt;/li&gt;
&lt;li&gt;Any mechanisms for rate limiting (see below).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Without these details it&#39;s a bit hard to say too much about how well
this is going to work at scale.&lt;/p&gt;
&lt;h2 id=&quot;compromised-devices&quot;&gt;Compromised Devices &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#compromised-devices&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As discussed above, much of the security of a digital credentials
system depends on the security of the device key. If the device
key is compromised, then the attacker can use that key to impersonate
the user. There are two main threat models here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The attacker gets temporary control of the user&#39;s device
and extracts the device key without their permission.&lt;/li&gt;
&lt;li&gt;The user and the attacker collude to extract the device
key.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both of these threats are real, though our major concern is
probably the second one, especially for age assurance. For
a simple impersonation attack, you may want to impersonate
someone in particular, but for age assurance, you just want
to impersonate anyone who is over 18, then it&#39;s probably
easier to use your own ID or the ID of some confederate,
especially because the privacy features don&#39;t require you
to disclose your own identity, just demonstrate that you&#39;re
over 18.&lt;/p&gt;
&lt;p&gt;These privacy features are also the challenge for detecting
this form of attack. Naively, a relying party (RP, which is to
say the verifier) could keep track of how
many times a given identity was used and then investigate
any identity which seemed to have excessive usage, but
if you have a system like selective disclosure or zero-knowledge
proofs, then things get more complicated.&lt;/p&gt;
&lt;p&gt;The public descriptions of these kinds of systems I have seen
are pretty vague about how they plan to defend against this
form of attack; they mostly just seem to assume the secure
element won&#39;t be broken, which isn&#39;t necessarily a &lt;a href=&quot;https://bits-please.blogspot.com/2016/06/extracting-qualcomms-keymaster-keys.html&quot;&gt;safe
assumption&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;selective-disclosure-2&quot;&gt;Selective Disclosure &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#selective-disclosure-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As noted above, selective disclosure systems aren&#39;t truly unlinkable:
if you use the same mdoc twice, then the relying party (or parties)
can link up multiple presentations.  However, as noted above, a good
implementation will get a fresh mdoc for each presentation. In this
case, the issuing authority can still link up presentations but the
relying party cannot.&lt;/p&gt;
&lt;p&gt;However, you can take advantage of the fact that each new mdoc
requires an interaction with the issuing authority. This creates a
number of opportunities for detection. First, you can do some
traffic analysis on the devices that ask for new mdocs, which
they&#39;ll need to do so fairly often, not just for privacy
reasons but also because they expire. For instance, if you
see repeated queries from different IP addresses, that is
potentially suspicious. Of course, whoever originally broke
the credential can proxy your requests, but this makes things
more complicated.&lt;/p&gt;
&lt;p&gt;Another alternative is to rate limit presentations.
The basic intuition here is that if there are &lt;code&gt;N&lt;/code&gt; issuers and the
user gets &lt;code&gt;M&lt;/code&gt; then the attacker can only do &lt;code&gt;N*M&lt;/code&gt; presentations before
they have to use the same credential twice on one RP. This leaves
two avenues for detection:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An RP noticing a lot of reuse&lt;/li&gt;
&lt;li&gt;The issuer noticing that the user gets an excessive number of
requests for mdocs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Importantly, you don&#39;t have to detect &lt;em&gt;every&lt;/em&gt; reuse: after all, you
can always lend your phone to someone else, which isn&#39;t really
detectable. Instead, the idea is to limit the number of
authentications you can get out of successfully attacking a single
device, thus forcing the attacker to expend the costs of enrolling
multiple real identities—or stealing legitimate
devices—and then breaking the devices to extract the device
key. If it costs $500 (made up numbers) to break a device and each key
can only be used for 5 users, this means that it needs to be worth
$100 for each user who wants to circumvent age assurance.&lt;/p&gt;
&lt;h3 id=&quot;zero-knowledge-proofs-2&quot;&gt;Zero-Knowledge Proofs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#zero-knowledge-proofs-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The situation with ZKPs is more complicated because the subject
can create as many ZKPs as they want without having to go back
to the issuer of the original credential. This means that
neither of the mechanisms I described above will work in
this context:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can&#39;t rate limit at the issuing authority because you
don&#39;t have to contact the issuing authority.&lt;/li&gt;
&lt;li&gt;The ZKP doesn&#39;t include the mdoc, so you can&#39;t trivially
compare multiple presentations by matching the mdocs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are, however, techniques for rate limiting in ZK authentication
systems. The basic idea is that you define what&#39;s called a
&amp;quot;nullifier&amp;quot;, which is a characteristic value for the pair of subject
and relying party (think &lt;code&gt;Hash(device-private-key, RP-identity)&lt;/code&gt;. When
a user authenticates to an RP (or in this case proves their age), they
include the RP-specific nullifier and the proof shows that it was
computed correctly. If the same credential is used to authenticate to
the same RP twice with the same user, the same nullifier will be used
and so the RP will be able to detect reuse.&lt;/p&gt;
&lt;p&gt;Obviously, this trivial design allows for linkage of multiple
presentations, but we can set an arbitrary rate limit by including
more inputs in the nullifier. Specifically, we can have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An &amp;quot;epoch&amp;quot; value corresponding to some time window.&lt;/li&gt;
&lt;li&gt;A counter which must be between &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;N&lt;/code&gt; where
&lt;code&gt;N&lt;/code&gt; is some upper limit.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Put together, these constraints allow for &lt;code&gt;N&lt;/code&gt; authentications
per RP per epoch while remaining unlinkable. However, if someone
tries to authenticate &lt;code&gt;N+1&lt;/code&gt; times, then they have to reuse
the counter and the RP can detect that a nullifier has been
reused and reject the authentication attempt.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible to extend this technique to not just
reject the authentication attempt but determine which
device was broken. Along with the nullifier, the authentication
also includes a secret share for the user&#39;s identifier
(or the device ID) which is designed so that if the counter is reused, the
RP will be able to put together the shares and reconstruct
the device or user identifier. Once the compromised device is
detected, the issuing authority can revoke its ability
to authenticate (most likely by just refusing to issue
more mdocs and letting the old ones expire).&lt;/p&gt;
&lt;p&gt;Note that the attacker can make this defense harder to mount
by retrieving multiple mdocs with different device keys and
providing them to the separate users, but each issuance is
visible to the issuing authority, which can impose rate limits.&lt;/p&gt;
&lt;p&gt;Again, it&#39;s not clear what Google is actually doing here; I&#39;m
just describing some avenues one could pursue.&lt;/p&gt;
&lt;h3 id=&quot;multiple-rps&quot;&gt;Multiple RPs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#multiple-rps&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This whole system works a lot better if there are a small number of
RPs. If there are a lot of RPs, then it becomes harder to detect
reuse. You need to set the per-RP rate limit high enough that a
legitimate user won&#39;t exhaust the limit during normal usage. It&#39;s
likely that, at least for porn sites, a legitimate user will only use
a small number of sites, but if there are a lot of porn sites, this
leaves plenty of room to spread a bunch of illegitimate users across
those sites. Of course, this still leaves the attacker with
the problem of coordinating users so they don&#39;t accidentally
overflow the limits, but it makes the detection problem harder.&lt;/p&gt;
&lt;p&gt;I think there&#39;s a real practical question about the distribution of
sites which require age assurance. Most content categories have really
top-heavy distributions where the vast majority of traffic goes to a
few sites (e.g., Facebook, Instagram, Twitter, TikTok, for social
networking), and they aren&#39;t interchangeable, in which case just
imposing rate limiting on the top site is likely to be fairly
effective.&lt;/p&gt;
&lt;p&gt;Adult sites don&#39;t have the network effects that social networking
sites have, so it&#39;s possible they are more interchangeable and that
users can gravitate to long-tail sites, as Dennis
Jackson &lt;a href=&quot;https://www.ietf.org/slides/slides-agews-paper-who-bears-the-burden-technical-architectures-for-age-based-content-restriction-00.pdf&quot;&gt;argues&lt;/a&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn15&quot; id=&quot;fnref15&quot;&gt;[15]&lt;/a&gt;&lt;/sup&gt;
It&#39;s hard to know in advance whether this is true, but what
we can do is look at existing traffic patterns, which are
similarly top-heavy:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/adult-sites.png&quot; alt=&quot;Traffic to top porn sites&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Traffic to the top porn sites
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Note that some of the sites are owned by the same entities, so
one would imagine they could cross-check between those sites.&lt;/p&gt;
&lt;p&gt;This doesn&#39;t exclude the possibility that users will switch
to lower-popularity sites if they have to, but it&#39;s definitely
going to be a lot more work to find sites that are this unpopular
compared to the big sites, and given how many of these sites
consist of user-uploaded content, it seems likely there is
a pretty significant dropoff in how much content there is
on the smaller sites (I haven&#39;t checked!).&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/age-verification-id/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;ZKP-based authentication and age assurance systems are extremely
technically cool, but they&#39;re only a component in a larger
system.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fn16&quot; id=&quot;fnref16&quot;&gt;[16]&lt;/a&gt;&lt;/sup&gt;
When used properly, a ZKP system allows you to disclose/prove attribute &lt;strong&gt;A&lt;/strong&gt; while
not disclosing attributes &lt;strong&gt;A&lt;/strong&gt;, &lt;strong&gt;B&lt;/strong&gt; and &lt;strong&gt;C&lt;/strong&gt;, but this doesn&#39;t mean that
the RP can&#39;t learn those attributes via some other mechanism. For instance, if
you connect to a server from your home without any form of &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/&quot;&gt;IP concealment&lt;/a&gt;, then the server may well be able to learn
who you are in any case.&lt;/p&gt;
&lt;p&gt;In addition, as pointed out by Hancock and Collins, the RP may &amp;quot;overask&amp;quot; for
attributes it doesn&#39;t really need, counting on the user not to notice that
they&#39;re disclosing their name or precise age. Your client software can
help defend against this by restricting the set of attributes an RP can request
(Apple requires RPs to register which ones they will request and hopefully
does some auditing of which ones they really need), but all of this is outside
the scope of the ZK system itself. Similarly, if it&#39;s simple and easy to prove
your age or other attributes, we may see a form of induced demand where you
have to do so more and more often.&lt;/p&gt;
&lt;p&gt;Finally, the proof systems themselves are very complicated and tricky to get
right, especially at the current level of technological development. There have been some &lt;a href=&quot;https://thehackernews.com/2019/02/zcash-cryptocurrency-hack.html#:~:text=Now%2C%20the%20Zcash%20team%20detailed,of%20the%20Catastrophic%20Zcash%20Vulnerability&quot;&gt;high profile&lt;/a&gt; &lt;a href=&quot;https://scispace.com/papers/revisiting-the-nova-proof-system-on-a-cycle-of-curves-6fb8atx4&quot;&gt;cases&lt;/a&gt; where ZKPs were deployed and
then found to not actually be secure in practice. This doesn&#39;t necessarily
lead to a privacy problem from the user&#39;s perspective as most of the
issues have instead allowed an attacker to prove something
false rather than leaking the user&#39;s information. However, it&#39;s obviously
still not great for deployment in practice.&lt;/p&gt;
&lt;p&gt;With all that said, it&#39;s important to remember that the reference point
here is the wide deployment of existing age assurance systems—whether
of the facial age estimation or the &amp;quot;selfie with ID&amp;quot; variety—that
don&#39;t conceal the user&#39;s identity from the verification service at all.
From a purely technical perspective, designs based on selective
disclosure or ZKPs are likely to have superior security and privacy
properties compared to these existing systems.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See &lt;a href=&quot;https://www.aamva.org/getmedia/99ac7057-0f4d-4461-b0a2-3a5532e1b35c/AAMVA-2020-DLID-Card-Design-Standard.pdf&quot;&gt;AAMVA DL/ID Card Design Standard 2020&lt;/a&gt;, Appendix
B.4 for the security features. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Encoded in &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=PDF417&amp;amp;oldid=1290792531&quot;&gt;PDF417&lt;/a&gt;
and conforming to &lt;a href=&quot;https://www.aamva.org/getmedia/99ac7057-0f4d-4461-b0a2-3a5532e1b35c/AAMVA-2020-DLID-Card-Design-Standard.pdf&quot;&gt;AAMVA DL/ID Card Design Standard 2020&lt;/a&gt;, Appendix
D. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;In some contexts, you might also want to sign
the verifier&#39;s identity, but we don&#39;t have to worry about that
right now. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
At least for the purpose of authentication. You still
may not want everyone knowing your birthday. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is precisely the situation
with Web server authentication: the server will give its
certificate to anyone who asks, but you can&#39;t impersonate
the server unless you know its private key. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Or if their device is compromised. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though it&#39;s also worse in some ways, because once
they get their ID card back, the younger sibling
can&#39;t use it; with a digital credential they
can use it indefinitely. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The user doesn&#39;t really have to trust the manufacturer
in this case, at least not more than they do for
other purposes. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that the device also needs to generate a fresh
device key for each credential. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See &lt;a href=&quot;https://support.apple.com/en-us/118260&quot;&gt;here&lt;/a&gt; for some discussion
of the privacy properties. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Apparently there is some case where it will reuse
the credentials if it runs out and cannot contact
the issuer in time. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
&lt;a href=&quot;https://developer.apple.com/videos/play/wwdc2025/232/&quot;&gt;video&lt;/a&gt; at 12:20. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The technical term for &lt;code&gt;w&lt;/code&gt; is a &amp;quot;witness&amp;quot;, hence &lt;code&gt;w&lt;/code&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
At least in many ZKP systems &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn15&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In the context of users selecting sites that do weaker
or no age assurance. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref15&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn16&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See commentary by &lt;a href=&quot;https://www.eff.org/deeplinks/2025/07/zero-knowledge-proofs-alone-are-not-digital-id-solution-protecting-user-privacy&quot;&gt;Alexis Hancock and Paige Collins from EFF&lt;/a&gt;,
&lt;a href=&quot;https://datatracker.ietf.org/doc/slides-agews-limitations-and-pitfalls-of-integrating-pets-in-online-age-verification/&quot;&gt;Chatel et al.&lt;/a&gt;, and &lt;a href=&quot;https://datatracker.ietf.org/doc/slides-agews-paper-private-and-decentralized-age-verification-architecture/&quot;&gt;Celi et al.&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/age-verification-id/#fnref16&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Ultra Tour Monte Rosa (UTMR) Race Report</title>
		<link href="https://educatedguesswork.org/posts/utmr/"/>
		<updated>2025-09-21T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/utmr/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/780.jpg&quot; alt=&quot;Pre-race picture&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The pre-race picture. I look happier now than I will be later.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This year my occasional&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;  training partner &lt;a href=&quot;https://heapingbits.net/&quot;&gt;Chris
Wood&lt;/a&gt; was selected in the
&lt;a href=&quot;https://montblanc.utmb.world/&quot;&gt;UTMB&lt;/a&gt; lottery and asked me to come
over to Chamonix and crew him. Europe is a long way to go and not
race, so I looked around and finally settled on &lt;a href=&quot;https://www.ultratourmonterosa.com/&quot;&gt;Ultra Tour Monte Rosa
(UTMR)&lt;/a&gt; as my &amp;quot;A&amp;quot; race. UTMR is
conceptually similar to UTMB in that it&#39;s a 170K tour around a
mountain in the Alps but it&#39;s about 10% more climbing than UTMB and
considerably more technical, so times are a lot slower. UTMR is about a week after UTMB, so
after crewing Chris I took the train from Chamonix to Grächen
on Monday, giving me a few days before the race start at 4 AM
Thursday.&lt;/p&gt;
&lt;h2 id=&quot;course-overview&quot;&gt;Course Overview &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#course-overview&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;UTMR is a serious mountain race with over 10000m of climbing.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/utmr-course.png&quot; alt=&quot;UTMR-course&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;UTMR course. From my actual &lt;a href=&quot;https://runalyze.com/&quot;&gt;Runalyze&lt;/a&gt; track.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/utmr-profile.png&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/utmr-profile.png&quot; alt=&quot;UTMR-profile&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;figcaption&gt;
UTMR &quot;final&quot; race profile
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Conceptually, I broke it up into four main sections corresponding
to the locations where you could have drop bags and conceptually
bigger aid stations.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start to Zermatt (34.2K)&lt;/li&gt;
&lt;li&gt;Zermatt to Gressoney-la-Trinite (77.4K)&lt;/li&gt;
&lt;li&gt;Gressoney-la-Trinite to Macuagnaga (123.6K)&lt;/li&gt;
&lt;li&gt;Macugnaga to finish (167.9K)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The segment to Zermatt is the fastest—though still with over
2000 height meters of climbing. It&#39;s important to get through
this section fast because after you leave Zermatt there&#39;s a big
climb up to the glacier and then a 2K glacier crossing. For
obvious reasons you want to cross the glacier during the day,
and so there&#39;s a tight (9 hr) cutoff at Zermatt.&lt;/p&gt;
&lt;p&gt;After you get over the glacier you&#39;re looking at a long mostly
net downhill section into Gressoney-la-Trinite, but with a significant
climb partway through.&lt;/p&gt;
&lt;p&gt;Followed by Gressoney-la-Trinite you have a series of three really big climbs.
The first two are before the Macucnaga aid station and then
after that there&#39;s only last big climb and descent followed by
a smaller (only 700 hm!) climb, some rolling stuff, and then
a descent into the finish.&lt;/p&gt;
&lt;p&gt;I&#39;d managed to recon the first few kilometers (nice!) and the last few
kilometers (incredibly steep), so I had a bit of a sense what to
expect here. I did the last two km on my first day into
Grächen, right at the time when a bunch of runners
from the (even longer!) &lt;a href=&quot;https://swisspeaks.ch/?lang=en&quot;&gt;Swiss Peaks&lt;/a&gt;
race were coming through; they looked tired and still had a long way to go!&lt;/p&gt;
&lt;h2 id=&quot;overall-logistics&quot;&gt;Overall Logistics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#overall-logistics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;UTMR is by far the longest race I&#39;d ever had to do in terms of time
so it presents some real logistical challenges, especially as I
was doing it without crew.&lt;/p&gt;
&lt;h3 id=&quot;food&quot;&gt;Food &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#food&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Food at an American race tends to be dominated by sports nutrition
such as energy bars, gels, sports drinks, etc.  (aka &amp;quot;space food&amp;quot; or
&amp;quot;engineered food&amp;quot;). On longer races like hundreds you&#39;ll often see
hot &amp;quot;real food&amp;quot; like soup, quesadillas, pancakes, bacon, or sometimes
even burgers as it gets later in the race. By contrast, European
races tend to be much heavier on some kind of real food from
the very beginning, but it&#39;s mostly snacks like bread, cheese, charcuterie
(seriously!), with maybe a small selection of sports food,
and then again some hot food as you get later into the day.&lt;/p&gt;
&lt;p&gt;I&#39;ve done nearly all my training with sports food, and I wasn&#39;t
sure how I&#39;d feel about bread and cheese&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; and I didn&#39;t have any experience with the sports drink
that UTMR was serving, so I planned to carry most of my nutrition
with me; UTMR had 3 drop bag locations so this meant I could
carry about 1/4 of my food for each segment, though in practice
I expected to try to eat some of the real food as well. This actually
wasn&#39;t so bad in terms of how much I had to carry between
aid stations but did mean I had an enormously heavy bag to
carry to Chamonix and then to Grächen.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/utmr-drop-bag-contents.jpeg&quot; alt=&quot;The contents of my drop bags&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The contents of my drop bags
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id=&quot;gear&quot;&gt;Gear &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#gear&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;European races tend to have more serious mandatory gear lists.
For example, here&#39;s what UTMR &lt;a href=&quot;https://www.ultratourmonterosa.com/useful-information/obligatory-equipment/&quot;&gt;requires&lt;/a&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Requirement&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;My gear&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Mobile phone&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.hmd.com/en_int/nokia-105/specs?sku=1GF019CPA2L05&quot;&gt;Nokia 105&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Fully functional head torch(s) with replacement batteries&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.petzl.com/US/en/Sport/Headlamps/NAO-RL&quot;&gt;Petzl Nao RL&lt;/a&gt; (main), &lt;a href=&quot;https://www.petzl.com/INT/en/Sport/Headlamps/ePLUSLITE&quot;&gt;Petzl        e-lite&lt;/a&gt;(backup)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Bottles or bladders with capacity to carry 1 litre&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Standard 500ml softflasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Emergency food rations in a sealed ziplock bag (400 calories)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Maurten, SIS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Emergency bivvy bag&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.rei.com/product/199053/sol-emergency-bivvy-with-rescue-whistle-and-tinder-cord&quot;&gt;SOL Emergency Bivy&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Whistle&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;On pack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Elastic bandage / strapping&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Coban&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Drinking cup (cups will not be provided at refreshment points)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.hydrapak.com/collections/soft-flasks/products/speed-cup-200-ml&quot;&gt;Hydrapak Speedcup&lt;/a&gt; (gimme from Lake Sonoma 50)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Waterproof jacket with hood&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Inov-8 Raceshell (discontinued)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Waterproof trousers&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://raidlight.com/products/pantalon-de-trail-impermeable-mixte-ultralight-mp-20k-20k?srsltid=AfmBOooGCb7VHRaDqp8YkdqCHA1alu5eOc341qed37m_oF1X3_B6GkAn&quot;&gt;Raidlight Ultralight MP+&lt;/a&gt; (older model)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Warm long-sleeved thermal top layer&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.patagonia.com/product/mens-capilene-thermal-weight-baselayer-zip-neck-pullover/43657.html?dwvar_43657_color=CLMB&quot;&gt;Patagonia Capilene Thermal&lt;/a&gt; (borrowed from Chris)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Long running trousers or trousers that cover over the knee&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.patagonia.com/product/mens-terrebonne-trail-joggers/24541.html?dwvar_24541_color=OTBR&quot;&gt;Patagonia Terrebonne trail joggers&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Warm hat&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Smartwool hat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Gloves&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;North Face Flashdry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;GPS tracker(provided at race registration)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Identity papers&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Microspikes&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Rented from race organizers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Bowl and spork&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://www.rei.com/product/231323/sea-to-summit-frontier-ultralight-collapsible-cup&quot;&gt;Sea to Summit Frontier Ultralight Collapsible Cup&lt;/a&gt;, &lt;a href=&quot;https://www.rei.com/product/231310/sea-to-summit-frontier-ultralight-spork&quot;&gt;Sea to Summit Frontier Ultralight Spork&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Sheet sleeping bag&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;https://mountainlaureldesigns.com/product/mountain-quilt-bag-liner/&quot;&gt;Mountain Laurel Designs Sleeping Bag &amp;amp; Quilt Liner&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;UTMR checks that you have all this stuff at registration before they give
you your race number, but of course this is just the minimum, and when
I got back to my hotel I had the following email:&lt;/p&gt;
&lt;blockquote&gt;
&lt;span style=&quot;color: red;&quot;&gt;SEVERE WEATHER WARNING&lt;/span&gt;
&lt;p&gt;Tomorrow afternoon from around 4pm onwards conditions crossing Teodulo (the glacier) and into Italy are expected to become very cold and windy, with snowfall. The temperature will be -2 deg C, with wind chill factor down to -10 deg C. Winds could be up to 50-60 km per hour.&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: red;&quot;&gt;For you safety please carry extra clothing including warm pants, thick gloves, warm hat, warm (duvet) hooded jacket.&lt;/span&gt; We suggest putting warm gear in your dropbag for Zermatt so that you have protection through the bad conditions.&lt;/p&gt;
&lt;p&gt;The race will proceed unless our security team advises us that conditions have become unsafe.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This definitely freaked me out and it would obviously have all been a lot easier if I&#39;d known it in
Chamonix which has about 5 outdoors stores per block. At this point I was
definitely regretting not bringing my tights or borrowing some from
Chris, but there were a few open stores and I ended up buying a pair
of hiking pants, a thick warm hat, and some thick windproof
gloves. I&#39;d already brought my Patagonia &lt;a href=&quot;https://www.patagonia.com/product/mens-micro-puff-insulated-hoody/84031.html&quot;&gt;Micro Puff
Hoody&lt;/a&gt;
so I was covered as far as a puffy (&amp;quot;duvet jacket&amp;quot;) goes.  All of this
extra stuff is bulky and heavy, but based on the weather reports I
wasn&#39;t going to need it till after the glacier, so I was able to store
it in my Zermatt drop bag. Also in the Zermatt bag: the microspikes
which are only needed on the glacier.&lt;/p&gt;
&lt;p&gt;I&#39;ve done previous races on headlamp only, but for this race I decided
to add a waist light (&lt;a href=&quot;https://ultraspire.com/products/lumen-600-5-0/&quot;&gt;UltrAspire
600&lt;/a&gt;); I have friends
who&#39;ve used them at races and said it was dramatically better and I
figured I&#39;d be out for two nights and this was the time to use it. I
expected Gressoney-la-Trinite would be somewhere a bit before midnight and it
got dark around 8 or 9, so I decided I&#39;d be OK with just the headlamp
till Gressoney-la-Trinite, thus avoiding having to carry it halfway.
I also left a pair of extra shoes in my Gressoney-la-Trinite drop bag, which
turned out to be a really good idea (see below).&lt;/p&gt;
&lt;p&gt;Pre-race I spent some
time dithering about what shoes to use for UTMR; I did most
of this season in a pair of &lt;a href=&quot;https://www.salomon.com/en-us/product/s-lab-genesis-lg9299#color=87291&quot;&gt;Salomon S/LAB Genesis&lt;/a&gt;,
but on my &lt;a href=&quot;https://educatedguesswork.org/posts/grand-loop.md&quot;&gt;last outing&lt;/a&gt;, I started to have
some discomfort in my feet about half-way through and so I
decided to try out the &lt;a href=&quot;https://www.salomon.com/en-us/product/s-lab-ultra-glide-1-5-li1245/L49283600&quot;&gt;Salomon S/LAB Ultra Glide&lt;/a&gt;,
which is much higher stack and bouncier. I ordered a pair of
the Ultra Glides when I was in Flagstaff, but I only had
about 30 miles on them and wasn&#39;t sure how they would
feel over a 100 miles. On the one hand, the Ultra Glides seemed to have
a somewhat tight toe box and I was worried they would
be too tight if my feet swelled during the race, but on
the other hand I thought it might be nice to switch
to something bouncier half-way. Eventually I decided to be
optimistic and start in the Ultra Glides and then have
the Genesis in my Gressoney-la-Trinite bag.&lt;/p&gt;
&lt;h2 id=&quot;pre-race&quot;&gt;Pre-Race &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#pre-race&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I spent Tuesday and Wednesday kind of bumming around Grächen and
trying to eat well and sleep as much as I could. It&#39;s a tiny town
and everything is within walking distance, so I walked over to the
local market and scored a bunch of food, including some gnocchi
and pesto for the night before (one of the benefits of being in
an AirBNB is that you can cook). I slept pretty well Monday and
Tuesday night but had a really hard time Wednesday night. Usually
I&#39;ll be able to fall asleep pretty well but will keep waking up
but this time I spent a lot of time just lying in bed doing relaxation
exercises and trying to fall asleep. Eventually I did get a few good
hours right before my wakeup time, but it wasn&#39;t amazing.&lt;/p&gt;
&lt;p&gt;I timed the start pretty well and got to the start at about 3:35 and
then realized that the volunteers wanted us to put pre-printed
labels on our drop bags—so that&#39;s why I had four wristband
type things in my packet—and I ended up having to run back
to my AirBNB, grab them, and then come back. Fortunately my AirBNB
was really close, so I still made it in time. In retrospect, I doubt
it would have mattered, but I was in pre-race rule following
mode.&lt;/p&gt;
&lt;h2 id=&quot;start-to-zermatt&quot;&gt;Start to Zermatt &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#start-to-zermatt&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first part of the race went quite well. You start by running
through the town for a kilometer or so, followed by a relatively
short but steep climb and then transition to a longish rolling section.
The rolling section is fairly runnable but still slightly technical
and narrow, and people were still fairly packed in, so I just tried
to cruise through it without expending too much effort.&lt;/p&gt;
&lt;p&gt;Following a shortish descent, we began the first big climb, 5.8 km
and 1011 hm up to the first aid station at Europahutte. This part
went relatively smoothly as it was early, everyone was relatively
fresh, and this early in the race everyone wants to take it easy.
I took a short stopover at Europahutte to fill my aid station and then
it&#39;s a short slightly technical downhill followed by what is
the longest foot suspension bridge in the alps,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Charles_Kuonen_Suspension_Bridge&amp;amp;oldid=1307104145&quot;&gt;the Charles Kuonen Suspension Bridge&lt;/a&gt;,
at almost 500m long.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://dynamic-media-cdn.tripadvisor.com/media/photo-o/11/8c/42/59/fotos-by-valentin-flauraud.jpg?w=1000&amp;amp;h=-1&amp;amp;s=1&quot; alt=&quot;Charles Kuonen Bridge&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Charles Kuonen Bridge: from TripAdvisor
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;There is a fantastic view from the bridge, but to be honest
I found it a fairly unpleasant experience because the bridge
sways quite a bit and you can hear the cables creaking. Intellectually
you know it&#39;s safe, but it just takes one look down to wonder
whether the engineers really know what they&#39;re doing. I wasn&#39;t
excited about the crossing, but it&#39;s not like you&#39;re going to
turn back, so I just gritted my teeth, made sure I had one hand on the rail,
and kept going. A few people had passed me on the descent
from Europahutte, but I found myself wishing more had, because
some of the people behind were crowding me, which didn&#39;t
help matters.&lt;/p&gt;
&lt;p&gt;The original UTMR course stays up high for a while but due to some
trail closures, from here there was a fairly rapid descent down to the
valley floor and then the aid stations in Attermenzen and then some
easy running on gravel road to Zermatt. This is a bit faster than the
original route and as a consequence I was way ahead of schedule and
the cutoff for leaving Zermatt, so things were looking good, at
least as far as not having to cross the glacier in the dark
went.&lt;/p&gt;
&lt;p&gt;I spent about 20 minutes in Zermatt overall, retrieving
all the stuff from my drop bags, cramming the extra clothes
into my dry bag, etc. This is obviously longer than
ideal, but in a long race like this, I don&#39;t mind spending
a little extra time at the aid stations, especially the
ones with my drop bags. Even so, I almost left my spikes
in the bag, which would have been disastrous, as they
are mandatory gear for the glacier crossing and it actually
would have been quite sketchy without them, even though
there wasn&#39;t really anyone enforcing it.&lt;/p&gt;
&lt;h2 id=&quot;zermatt-to-gressoney-la-trinite&quot;&gt;Zermatt to Gressoney-la-Trinite &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#zermatt-to-gressoney-la-trinite&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;From here on in, things get real, starting with a 1389m climb
up to to Trockenersteg. This is actually a pretty easy
grade as far as UTMR goes (14% average) but when added up
with the glacier crossing it&#39;s the longest more or less
continuous up on the course. I hit the top here feeling
pretty good—it&#39;s at high altitude, but I&#39;d spent
three weeks in Flagstaff getting acclimatized, with the
result that there&#39;s still some negative impact, but
I didn&#39;t feel that bad, unlike some of the people nearby
me, who were clearly feeling the altitude.&lt;/p&gt;
&lt;p&gt;Trockenersteg is a tiny aid station, basically just a table set up
in the doorway of the building at the top of the ski lift;
there were bathrooms and the like and probably some kind
of cafe (maybe closed?) but the station itself was just dudes
with fluids and energy bars. At least it was shaded from
the wind, though, which let us get our warm gear on in
preparation for the glacier crossing, which promised to be windy.
It&#39;s about 1km to the ice itself and from there it&#39;s about 2 miles
on ice and snow.&lt;/p&gt;
&lt;p&gt;We just sort of trotted over to the ice
transition and then everyone sort of collectively sat
down and put on their spikes and maybe some warmer clothes,
and then headed out onto the ice. For my money this
was the best part of the race. You&#39;re up above 3000 m (10000 ft)
and walking over a giant piece of ice—how can that not
feel epic? Now the truth of the matter is that this section
of the glacier is between two lodges and, has, I&#39;m told,
been groomed, but nevertheless, it &lt;em&gt;feels&lt;/em&gt; wild, especially if
you live somewhere, like I do, where there is basically no
snow.
Once you get past the snowfield, there is a relatively
short section on rock up to Teodul and then it&#39;s 16.9 km
and 1600 m down to Teodul.&lt;/p&gt;
&lt;p&gt;This downhill went pretty well, except that
at some point I tripped and went down hard, hurting&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
one
of my ribs and tweaking my wrist. This wasn&#39;t great and led
to some discomfort throughout the rest of the race, but nothing
that was going to stop me. Thanks to the runner whose name I&#39;ve
forgotten but was with me at the time who helped retrieve my poles and made
sure I was OK.
Somewhere on the downhill
I ran into &lt;a href=&quot;https://ultrasignup.com/results_participant.aspx?fname=Stuart&amp;amp;lname=Secker&quot;&gt;Stuart Secker&lt;/a&gt;,
who I&#39;d met on Monday and had dinner with. He had a lot
of experience with 100+ mile races, especially difficult
stuff like Mogollon Monster and UTCT, plus he&#39;d reconned
some of the course, so I decided to stick with him for
a while, and we headed to Rifugio Ferraro together.&lt;/p&gt;
&lt;p&gt;By the time we hit the Rifugio it had been raining on and
off, and it was clear that the bad weather had started to
roll in. The guidance from race officials was a bit equivocal,
something like:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It should get better in a few hours.&lt;/li&gt;
&lt;li&gt;We&#39;re not canceling the race.&lt;/li&gt;
&lt;li&gt;We advise you to stay here for a bit.&lt;/li&gt;
&lt;li&gt;We won&#39;t make you stay.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I&#39;m generally not a fan of waiting things out unless they&#39;re
really bad and neither was Stuart, so after a bit we decided to
head out.&lt;/p&gt;
&lt;p&gt;From Rifiguo Ferraro to Gressoney-la-Trinite is a substantial
climb (~800m) followed by a longer downhill. Pretty soon it
started to rain fairly hard and then we were getting lightning and thunder,
though the lightning seemed modestly far away.
At this point I had on a warm top plus my rain pants and
my rain jacket, but not my rain gloves or (I think
my second warm bottom layer). Unfortunately, once it&#39;s
raining this much, you really don&#39;t want to take &lt;em&gt;off&lt;/em&gt;
your rain gear to put anything on underneath it, so I was
mostly just stuck being a bit uncomfortable. Fortunately,
the climb itself was mostly rock and wasn&#39;t too slippery,
though it &lt;em&gt;wasn&#39;t&lt;/em&gt; just smooth path either.&lt;/p&gt;
&lt;p&gt;By the time I hit the top, it was dark, really windy, I was cold, and
anything that wasn&#39;t covered by waterproof gear—in particular my
hands—was cold. I tried to find a location out of the wind in
the summit to put on my warm gloves, but they hadn&#39;t been in the dry
bag and were so wet that I couldn&#39;t get them on with my stiff hands,
so I ended up just getting colder and watching people pass me before I
headed down. I&#39;d stopped partway up for a minute or so and had managed
to lose Stuart, but managed to run some this first section a bit and
catch up with him. His take was that the most important thing was just
to lose altitude as fast as we could—thus getting to where it
was warmer—rather than try to get warm immediately, so we pushed
on.&lt;/p&gt;
&lt;p&gt;This descent was all dirt and would have been super runnable except
that with all the rain it was instead really muddy and slippery and my feet came
out from under me and I fell on my ass a number of times.
Nothing
was really hurt other than my pride, but I definitely ended up
covered in mud. These were mostly minor falls, but then later
Stuart fell a fair bit harder and tore his pack, so a hard
bit all around.&lt;/p&gt;
&lt;h2 id=&quot;gressoney-la-trinite-to-rifugio-pastore&quot;&gt;Gressoney-la-Trinite to Rifugio Pastore &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#gressoney-la-trinite-to-rifugio-pastore&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Gressoney was the first aid station with beds, and my original plan
had been to get some sleep there to avoid my traditional 3 AM low
spot, but when we arrived we were told
that the beds were all in use and there weren&#39;t even any spare blankets.
This wasn&#39;t an immediate disaster because I needed to do some
maintenance in the form of changing out of wet clothes, swapping
stuff out of my drop bags, etc., and I was hoping that by the time
I was done a bed would have opened up.&lt;/p&gt;
&lt;p&gt;Once I got into drier
clothes—including the warmer hiking pants I had bought—I
set about fixing my feet; after hours of being wet and muddy
they had started to wrinkle up and I knew from past races that
this can lead to irritation between the wrinkles. Pretty much
whenever I stepped I was starting to get discomfort and this
can be a race ender, so I knew I had to attend to it.
The main fix for this is to get them dry and keep them dry. I was able to get
some paper towels from the volunteers but medical didn&#39;t seem
prepared to do anything, and when I asked them for diaper
cream&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
they didn&#39;t have it, but fortunately Stuart actually had some,
so with the help of a surgical glove&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
borrowed from another racer, I was able to get my feet nicely
covered with cream and then get new socks on. I was somewhat
distressed to discover that one of these socks had a small hole in
the toe, which would come back to haunt me later, but at this
point I didn&#39;t have much in the way of choice.&lt;/p&gt;
&lt;p&gt;I also had to decide which shoes to wear: I was finding the Ultra
Glides quite comfortable, with no real discomfort, but at this
point they were totally waterlogged, and I decided that the best
thing to do was to swap them out for the S/LAB Genesis, which were
dry and have a Matryx upper which tends not to absorb so much moisture.
I think this was the right call, because at this point I was super
worried about my feet being too wet, but I wish I&#39;d tried the Ultra
Glides earlier; if I had I might have just had two pairs of these.
I also picked up my UltrAspire waist lamp, which meant I had
twice as much light going out of Gressoney as heading in.
From here on, there are three big climbs, all between 1200 and 1600 m,
and then the rolling bit to the finish, but this gave me a set of
milestones to work with.&lt;/p&gt;
&lt;p&gt;I lurked around Gressoney a little while longer, and tried to get
some sleep on the floor, but without much success. Stuart told
me he was thinking about dropping—he eventually did—and
Karl, another guy I had been running with told me he was definitely
dropping. I didn&#39;t want to head out entirely on my own in the dark
so I ended up hooking up with three French guys for the next stretch.&lt;/p&gt;
&lt;p&gt;Apparently I&#39;d rested enough and warmed up, because I felt reasonably
good coming out of Gressoney, and quickly found myself dropping
my companions. I&#39;m not sure how eager they really were to have some
non-French speaker tagging along anyway, so I eventually just ended up
pushing through this section largely on my own. I don&#39;t actually
remember this bit that clearly, probably because it was in the
dark and I was undercaffeinated—I was still hoping to sleep—but eventually
I made it to the top of the climb at Passo dei Salati, which was basically a ski lodge.
This rifugio was like the house of the walking dead full of race zombies.
There weren&#39;t any beds but there were some people stretched out
on benches trying to sleep or just asleep on tables, and after
grabbing some soup and tea (no coffee available!) I tried to do the
same, and I think got maybe 5-10 minutes. Not enough.&lt;/p&gt;
&lt;p&gt;From Passo dei Salati to Rifugio Pastore is a long downhill followed
by a climb of about 400 meters to the Rifugio. In theory that shouldn&#39;t
have been that hard but I managed to make it about 5 feet out the door
before realizing I needed more clothes, so went back inside, layered
up and then headed back out. This section was a bit tricky to navigate
in the dark and I ended up briefly teaming up with some other people
to find the way down, but eventually got separated.&lt;/p&gt;
&lt;p&gt;This segment was fairly runnable once it got light,
but I started to have a really low spot partway through, I think due
to a combination of not eating enough (see below) and not sleeping
(this is why I wanted to sleep at Gressoney!). A few times I just
sat at the side of the trail on a rock and tried to recover and eventually
Mick Caren stopped by, asked if I was OK, and offered to wait with me.
I sat for a few minutes and then we set out together, which was super
helpful, and we ran together for much of the rest of the race.
It was fairly uneventful down to the bottom and then the first part of
the climb to the Rifugio was on asphalt and gravel road so Mick and
I were able to hike that nice and fast. Eventually, we had to turn off
onto single track again, and things got steep, but it wasn&#39;t &lt;em&gt;that&lt;/em&gt;
far to the Rifugio.&lt;/p&gt;
&lt;p&gt;At this point I was super tired, but fortunately they had a sleeping
room and there were plenty of beds, so I settled down for
a nap, and asked one of the volunteers to
wake me up in 20 minutes. I&#39;m not saying it was great, but I did manage
to get some sleep, which was a huge relief and I felt much
more prepared for the next two climbs. When I woke up, Mick was still there
and he told me that he&#39;d just gotten a message that due to a rockfall
the race was being truncated at Saas Fe (147 km) and we&#39;d (somehow) be
shuttled back to the start. I&#39;m not going to say I was 100% sad about
by this: I obviously wanted to finish the race, but I was also getting
pretty tired.&lt;/p&gt;
&lt;p&gt;Knowing that we only had to really make the next two climbs and then
it was over gave us all some new energy and we set out at a pretty
good pace. At this point, some of the stage racers were starting
to pass us&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt; which made it
a bit hard to judge your pace, but also was a bit energizing to
see people who were (1) fresh and (2) impressed by you.&lt;/p&gt;
&lt;h2 id=&quot;rifugio-pastore-to-macucnaga&quot;&gt;Rifugio Pastore to Macucnaga &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#rifugio-pastore-to-macucnaga&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next stretch is the longest without aid in the entire
race (21.6km) and requires you to go all the way to the top
of the pass and then back down again. I started out feeling
OK and things were looking good and then partway up the
weather started to turn, first into rain and then into
hail. I really didn&#39;t want to get soaked again so I stopped
to change my gear and lost contact with the people I was
with for the rest of the climb, about 1200 m.&lt;/p&gt;
&lt;p&gt;The climb itself had reasonable footing, consisting mostly
of moderate-sized rocks and scree, with a foot-wide or so
ledge of flat rocks set in the side of each switchback, so
you had the choice of hiking up the slightly unstable
scree or adapting yourself to the ledge. It seemed
like most people did a combination of the two of these,
and I did as well. I wouldn&#39;t say I was feeling great,
but I was managing to make OK time, albeit having to
stop several times to put on gear or take it off.&lt;/p&gt;
&lt;p&gt;Eventually I hit the top and started the long descent, which
is where things started to go really sideways. A common
phenomenon in most ultras is that as your legs get tired
it gets harder to run downhill and you actually want to
hike more and more, but this was something new: the downhill
was really difficult single track with big rocks and roots.
Someone who was better on the downhill than me or fresher
could run this—and a number of the stage people
came by—but I was reduced to mostly hiking and some
very slow jogging and certainly was never able to get a rhythm.
This went on forever and every time you started to think it
would open up it would be a false alarm and you&#39;d just
have to climb over some new rock. This was an incredibly
demoralizing section for me because I was expecting to be moving
moderately fast in this section but actually I was going about the
same speed down as up.&lt;/p&gt;
&lt;p&gt;I ran most of this section with Jamie Hardman, who had sort of been
trailing me on the climb but then caught me on the downhill
and we both kind of suffered through it together until we
&lt;em&gt;finally&lt;/em&gt; hit some easy fire road. We were about 3km from
the Macucnaga aid station when suddenly I started to
have some real GI problems and I had to let Jamie go while
I ducked into the woods to take care of business. At the end
of the day, though, I was moving a bit faster than he was and
so I got to the last AS only a few minutes later.&lt;/p&gt;
&lt;h2 id=&quot;macucnaga-to-saas-fe&quot;&gt;Macucnaga to Saas Fe &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#macucnaga-to-saas-fe&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Macucnaga is the finish of stage 3 of the stage race, and
seemed to be in some kind of restaurant/hostel kind of thing.
Everyone was kind of lurking around the restaurant feeling
half-dead and knowing they had to get up and keep going but
not really wanting to either.&lt;/p&gt;
&lt;p&gt;This was the last drop bag location and so I was able to
ditch some of my stuff—though not too much—and
change my socks again. As soon as I got my shoes off
I was able to see that I had a lot of wrinkling and
ended up having a long exchange with the volunteers and
the medic about whether they had any diaper cream. They didn&#39;t
and wanted me to go see the race doctor, which seemed like a lot
of overhead. Fortunately, after 10 minutes of hanging around
barefoot my feet seemed to have dried up enough that the
wrinkles were abating, so I just smeared as much &lt;a href=&quot;http://www.sportslick.com/&quot;&gt;Sportslick&lt;/a&gt;
on them as I could manage, put my sock on, and crossed my fingers.
Mick had gotten in a ways before Jamie and I, but he was
still hanging around and we all decided to head out together.&lt;/p&gt;
&lt;p&gt;The last climb to Monte Moro Pass is the steepest long
climb of the whole course, clocking in at 1500+ hm over
6.4 km, at an average grade of around 23%, so we knew we
were in for something special. The initial part of the
climb was actually pretty encouraging: well groomed
rock in a better version of the previous climb, but soon
enough it veered off into single track. This pattern
continued for some time, with a section of rough
fire road and then you&#39;d have to do some rocky
single track which would eventually come back to the fire
road, just to add insult to injury.
I had some more GI issues partway up but was able to find
a place to pull off while everyone waited. I came up to
find that they were chilling with some goats, so I guess
they had found a way to entertain themselves.&lt;/p&gt;
&lt;p&gt;Eventually, this all gave way to the real mountain
trails, which is to say big rock slabs without much of
a trail where you just kind of go flag to flag. By
this time, it was getting dark, windy and cold. I bundled
up early while everyone else waited but then about 15-20 minutes
later they were getting cold and we had to try to find something
slightly shaded from the wind so they could put their gear on.
This climb is really deceptive because you can&#39;t see the finishing
hut for much of the way, but you &lt;em&gt;can&lt;/em&gt; see a a hut/ski lift terminus
cut into the side of the mountain and you keep thinking you&#39;re heading
towards that, but actually you just bypass it entirely.&lt;/p&gt;
&lt;p&gt;The last 300 height meters of climbing are almost certainly the
worst because you&#39;re at high altitude, and it&#39;s rocky and steep,
and you&#39;re constantly having to high step and then maybe not
make it and fall back onto the previous step. We could see
another runner maybe 2 minutes ahead of us and he kept stopping—to
catch his breath maybe?—but we never quite caught him.
Finally, we hit the ridge line and then it&#39;s a short few hundred
flattish meters to the aid station, with the promise of it
being all (mostly?) downhill from there.&lt;/p&gt;
&lt;p&gt;It turns out that &amp;quot;mostly&amp;quot; is doing a lot of work here. First,
the aid station isn&#39;t actually at the top of the peak. Instead you need
to climb up to the top to where the &lt;a href=&quot;https://www.komoot.com/highlight/72630&quot;&gt;Golden Madonna Statue&lt;/a&gt;
is.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://d2exd72xrrp1s7.cloudfront.net/www/000/1k6/1g/1gbbmfvjom25a9p1gsyf5xeka45xm6bc1-uhi49443315/0?width=1260&amp;amp;crop=false&amp;amp;q=80&quot; alt=&quot;Golden Madonna Statue&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The Golden Madonna Statue. From Komoot.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This isn&#39;t really a technical climb, but what it is is a bunch of
metal stairs bolted (cantilevered) into the rock face, so you&#39;re
going up this relatively exposed section—with, at least
in my case, a death grip on some cable—to the top. From
there, we&#39;d been told it was about 2-3km and 500m down on
technical rock and then it was runnable.
This turned out to be literally true, but with a big asterisk. First,
the rock wasn&#39;t just technical but wet and incredibly slippery and
fairly exposed. Fortunately, none of us actually fell of, though
we did manage to get way off course and have to be waved back on
by the photographer.&lt;/p&gt;
&lt;p&gt;Eventually we hit the runnable bit, which was initially grass and
then some very nice road. Looking at the profile, it looked like
we had to go up some and then it was a nice long descent to
the finish, where we had to lose another 500m or so. We power hiked
to the top pretty fast and I announced I wanted to run to the finish
and that I needed to stop and take some of my gear off. Jamie and
Mick politely waited and then waited some more after the long
suffering zipper in my pack gave up, leaving me with a giant hole
where stuff could come out. After some maneuvering, I ended up
pulling most of the bulky stuff out and into my dry bag, and
then closing the remaining hole somewhat with safety pins from
my bib.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
Initially, I was just holding the dry bag, but eventually
I realized I could hang it from my pack, and so I could run.&lt;/p&gt;
&lt;p&gt;Jamie and Mick didn&#39;t seem superenthusiastic about the whole running
thing (Mick: &amp;quot;I could shuffle&amp;quot;) and I was starting to gap them a bit,
which I felt a little bad about seeing as they&#39;d waited while I
broke my pack. In the event, though, it didn&#39;t matter much because
soon enough we detoured off the perfectly good fire road into (you guessed it),
yet more super rocky single track, which slowed us down a lot.
I&#39;m not going to say I loved this, but I think my companions
found it a lot more demoralizing; at this point I was just resigned
to slogging through it. It also helped that I had put my poles
away which turns out to be easier on this kind of terrain, at least
for me.&lt;/p&gt;
&lt;p&gt;Eventually, we got dropped off at Saas Almagell, at which point things
promptly went wrong because there was something wrong with the
flagging. We spent 20-30 minutes messing around trying to figure out
where to go (props to Mick for insisting we were going the wrong way!)
and eventually had to message the race director to ask
what to do. The dude they sent out didn&#39;t really speak English, but he
managed to communicate that we should go back in the other direction
and led us to an arrow which we had missed the first time.  From there
it was about 5K and 200 meters of ascent into Saas Fe, but on nice
gravel road and then actual road, though with a short section of
single track. I was still feeling like I could run at this point, but
I didn&#39;t see a lot of value in splitting up so I just kind
of dawdled a bit and we all finished together.  There wasn&#39;t much in
the way of a finishing arch, just a sign and some volunteers who
scanned us and that was it.&lt;/p&gt;
&lt;p&gt;I won&#39;t say I wasn&#39;t tired at this point, but I was basically
feeling OK and I don&#39;t think I would have had any trouble
finishing the race if they hadn&#39;t cut it short.&lt;/p&gt;
&lt;h2 id=&quot;post-race&quot;&gt;Post-Race &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#post-race&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I mentioned above, the messaging was pretty vague on how we
we were going to get get from Saas Fe to the finish, and what
we were then told was as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;There is one shuttle.&lt;/li&gt;
&lt;li&gt;It takes 8 people.&lt;/li&gt;
&lt;li&gt;It takes about 2 hrs for the shuttle to round trip to Grächen.&lt;/li&gt;
&lt;li&gt;The 4:00 shuttle is full so you have to wait for the 6:00 shuttle.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Nobody was that enthusiastic about this, but there also wasn&#39;t much
to do about it; you can&#39;t really Uber at 3:30 AM in Saas Fe, and the
alternative was taking the train, which itself takes like 2 hrs, so
I just settled in to wait. Initially I was told there were no beds,
but then someone found me one, and I futilely tried to sleep for 20
min or so and then just resigned myself to sitting in a chair,
snacking, and trying to stay warm till 6:00. Eventually, they told
us to walk over to the shuttle, about 10 minutes,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
and then a quick 45 minutes or so back to the start.&lt;/p&gt;
&lt;p&gt;It&#39;s clear there were a bunch of last minute arrangements, but
this section really could have been handled better. There
was a lot of confusion about who would be on the next shuttle,
with the intent seeming to be in order of arrival, but actually
they started to take people in a different order until there
were some loud objections. Also, having one bus really isn&#39;t
enough.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This was by far the hardest race I&#39;ve ever done, much much harder
than UTMB. The comparison point I&#39;ve been using for people is that
there are parts of UTMB that are somewhat technical and that
you might be a little concerned about running. Once you take out
the relatively short road or fire road sections, that&#39;s what the
good parts of UTMR are like. The bad parts are, I guess, in principle
runnable in parts, but really technical. The best comparison points
I can give from my own experience are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The bad downhill parts are like the most rocky parts of
&lt;a href=&quot;https://www.nps.gov/rocr/index.htm&quot;&gt;Rock Creek Park&lt;/a&gt; in DC.&lt;/li&gt;
&lt;li&gt;A lot of the climbs are like the upper parts of climbs
in the Sierras (e.g., Glen Pass) with a lot of loose rock
and scree.&lt;/li&gt;
&lt;li&gt;The top part of the climb to Monte Moro is like Flagstaff&#39;s
&lt;a href=&quot;https://www.strava.com/activities/5449630899&quot;&gt;Blue Dot&lt;/a&gt;
but longer and colder.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&#39;d trained on all this stuff, but it&#39;s different to have a couple
miles of something and just to have it be endless.&lt;/p&gt;
&lt;h3 id=&quot;nutrition&quot;&gt;Nutrition &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#nutrition&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I really didn&#39;t
keep to my nutrition plan. Based on previous practice, I had been
planning to get a lot of my nutrition from sports drink, drinking
high cab drinks in half my bottles and lower carb in the other
half and using solid food when drinking the lower carb. In
the event, I just didn&#39;t drink anywhere near as much as I expected;
my timer would go off and I didn&#39;t feel like drinking and just
kind of put it off, so even on long stretches I never really
ran out of water. I also had to force myself to eat
solid food. On the other hand, I&#39;d get to aid stations and
would feel a lot more interested in the bread and cheese.&lt;/p&gt;
&lt;p&gt;I&#39;m not sure how big a problem this was in practice, because
you don&#39;t really need to get in 300+ calories an hour at
these low levels of exertion. There were some moments
where I did really feel like I needed to eat more, but I think
those mostly coincided with low points for other reasons,
such as fatigue. Basically, I think I could have managed
this better, but I&#39;m not sure it was really impactful. I would
definitely plan differently for a future event, though, focusing
more on salty stuff and less on sweet foods. Deprioritizing
liquid calories would also make aid stations easier as I
wouldn&#39;t have to get sports drink into my bottles.&lt;/p&gt;
&lt;h3 id=&quot;time-and-pacing&quot;&gt;Time and Pacing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmr/#time-and-pacing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The UTMR site says that times are about 20% slower than UTMB, but
I was clearly much slower (and the winning time is about 30% slower
than a typical UTMB winner, though UTMB is of course a much better field).
On the basis of my 37:49 UTMB finish
time I had guesstimated 45:00 for UTMR, but I didn&#39;t even finish
the abbreviated version in that time, and I would have expected
something in the low 50s for the full course, so obviously I underestimated
things. That was just a guess, so I don&#39;t want to get fixated on
comparing to 45 hours, but I did spend some time trying to figure
out what parts were slower or faster than you would have expected.&lt;/p&gt;
&lt;p&gt;The obvious thing to do here is to compare to my &lt;a href=&quot;https://ultrapacer.com/&quot;&gt;UltraPacer&lt;/a&gt;
forecast, but this ended up with several challenges. This is gonna get a bit
technical so feel free to &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#end-of-tech&quot;&gt;skip down a bit&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;My original forecast was for 45 hours and I hadn&#39;t put down
any real wait times at aid stations, just forecasting a ridiculous
5 minutes.&lt;/li&gt;
&lt;li&gt;Ultrapacer is having some trouble determining aid station delays.
I think this is because the positions of the aid stations are a bit
off. UTMR&#39;s tracking had &lt;a href=&quot;https://live.opentracking.co.uk/UTMR25ultra170/?b=780&quot;&gt;the same problem to some degree&lt;/a&gt;.
I have splits from my watch for some of these but not others.&lt;/li&gt;
&lt;li&gt;I accidentally stopped my watch somewhere in the middle of
the course for 40 minutes, and when &lt;a href=&quot;https://runalyze.com/&quot;&gt;Runalyze&lt;/a&gt;
exported a GPX file from the FIT file it basically erased the
gap, shortening the elapsed time by the time my watch was stopped.
UltraPacer won&#39;t take a FIT file and Garmin choked trying to output
the GPX.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ultimately, what I ended up doing here is to use &lt;a href=&quot;https://fitdecode.readthedocs.io/en/latest/&quot;&gt;fitdecode&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
to translate the FIT file to a GPX file with unadjusted timestamps
and then upload it to UltraPacer. I created a new plan for 50 hours
and added some split times for the aid stations, using a combination
of my real splits and eyeballing a bit. This doesn&#39;t tell us everything,
but nevertheless there&#39;s a pretty clear pattern, shown in the following
figure.&lt;/p&gt;
&lt;p&gt;&lt;a id=&quot;end-of-tech&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/utmr-comparison.png&quot; alt=&quot;UTMR pace comparison&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
UTMR pace comparison
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Part of what&#39;s going down here is just UTMR underestimating
how much I&#39;m going to slow down throughout the race (you can
tune that parameter but I didn&#39;t), but I think the main
contributor here is how slow I was on the two technical
downhills, followed by spending a lot of time in aid
stations in the last half (I wasn&#39;t the only one!). You can
see that mostly when it gets uphill, I am flat against the
forecast and or making progress, and then when it becomes
downhill I fall behind a lot.&lt;/p&gt;
&lt;p&gt;You can&#39;t blame UltraPacer for this: if you don&#39;t tell it a section is
technical it just works based on grade, so it doesn&#39;t know that those
sections will be especially bad, but it&#39;s also the case that I&#39;m
particularly slow on that kind of section; as I said, a lot
of people were going by me.&lt;/p&gt;
&lt;p&gt;I finished fairly far down (officially 119 but we all came in
togetherish so really 117th) out
of 135 finishers with 61 DNFs, so this is pretty squarely in the
middle. I usually finish a bit further up, but in talking to
people I got the impression that this was a really stacked field;
almost everyone either had done or was doing some serious race,
whether it was UTMB, Mogollon Monster, Arc of Attrition, or
whatever, and I know some of the less prepared people dropped out.&lt;/p&gt;
&lt;p&gt;This is actually right about where I finished in relation to
the fastest finisher as UTMB (a couple percentage points faster this time). On the one hand,
I felt like I trained harder for UTMR and was more prepared, but on
the other hand this race really pushed one of my weaknesses that I&#39;ve
had trouble preparing for because there&#39;s just not much of it here.
If those sections had been like the rest of the race, I think I
would have been about 2-3 hrs faster,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt; but still clearly not 45 hours
for the full race.&lt;/p&gt;
&lt;p&gt;All in all, this was an epic adventure and definitely worth doing.
With that said, I think I&#39;m going to take a break from this kind of
technical European race. Both here and at Grand Loop I found it
kind of frustrating to have these long sections you were just
going super slow because you had to pick your way through
something. I definitely do like having
a lot of climbing and I don&#39;t mind being out on the trail so long&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;At least for next season I&#39;d
rather focus on races where the limiting factor is my fitness
rather than my agility.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt; He moved to NYC! &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I&#39;m vegetarian, so no
charcuterie. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I usually use an iPhone 12 mini but I wasn&#39;t sure the battery would last the full race and I didn&#39;t want to have to mess around with a battery pack so I just bought a cheapie feature phone and a local SIM. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
According to Wikipedia, the longest foot suspension bridge in the
world at the time it opened. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Bruising? Cracking? Who knows. It hurt some but not
as bad as when I&#39;ve definitely broken a rib. In any
case doctors don&#39;t bother with the difference because
they don&#39;t treat broken ribs unless it&#39;s really bad or
displaced or something, and you just have to wait it out.
Feels better now. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
A trick I learned from Roman Danyliw. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&amp;quot;Windproof and waterproof glove&amp;quot; &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is a 4-day stage version of UTMR. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I had duct tape but it was too stuck together to use... &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Saas Fe is apparently
some kind of car-free zone. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Which I totally vibe coded. Hope it&#39;s right! &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Better not even to speak of the 30 minutes we
spent messing around Saas Agamell. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Though
I still maintain that a 100K is a great distance because you
can sleep in bed after. &lt;a href=&quot;https://educatedguesswork.org/posts/utmr/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Olympic Grand Loop (Deer Park Loop)</title>
		<link href="https://educatedguesswork.org/posts/grand-loop/"/>
		<updated>2025-08-15T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/grand-loop/</id>
		<content type="html">&lt;p&gt;This year my
occasional&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grand-loop/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
training partner &lt;a href=&quot;https://heapingbits.net/&quot;&gt;Chris Wood&lt;/a&gt; was
selected in the &lt;a href=&quot;https://montblanc.utmb.world/&quot;&gt;UTMB&lt;/a&gt; lottery
and asked me to come over to Chamonix and crew him. Europe is a
long way to go and not race, so I looked around and finally
settled on &lt;a href=&quot;https://www.ultratourmonterosa.com/&quot;&gt;Ultra Tour Monte Rosa (UTMR)&lt;/a&gt;.
UTMR is conceptually similar to UTMB in that it&#39;s a 170K tour around
a mountain in the Alps but it&#39;s about 10% more climbing than UTMB
and substantially more technical, so the finish times are around
20% slower. I ran UTMB back in 2022 and finished in 37:49, so I knew I had to
put in some serious training if I didn&#39;t want UTMR to be a miserable experience. I like to do some
adventure runs towards the end of the training cycle both
as a training tool and to test out your fitness, nutrition, etc.&lt;/p&gt;
&lt;p&gt;This time, Chris and I selected the &lt;a href=&quot;https://fastestknowntime.com/route/olympic-national-park-grand-loop-wa&quot;&gt;Grand
Loop&lt;/a&gt;
in Olympic National Park. At 43 miles and 13000 ft,
the Grand Loop is sort of
like a scaled down version of UTMB/UMTR, so we figured it was
good test of our fitness/final shakeout event.
As well as having a lot of up and down, the
climbs and descents get bigger the further along
you go, culminating in a 3000 foot climb to
the finish, which is good practice but still
smaller than the biggest climbs at UTMR or UTMB.&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/grand-loop-map.png&quot; width=&quot;75%&quot; /&gt;
&lt;figcaption&gt;
Map of the course. From Gaia GPS
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grand-loop-profile.png&quot; alt=&quot;Grand Loop Profile&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Map of the course. From Runalyze
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The &lt;a href=&quot;https://fastestknowntime.com/route/olympic-national-park-grand-loop-wa&quot;&gt;fastest known time for this route&lt;/a&gt; is 8:33, but &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Max_King_(runner)&amp;amp;oldid=1193154149&quot;&gt;Max King&lt;/a&gt; did it in 10:40 back in 2020, so I was
kind of uncertain how long it would take us. I estimated
about 15 hrs, with 14 if things went really well, and put
together a food plan for a bit more than 15 but a pace chart
for 14. This turned out to be fairly on.&lt;/p&gt;
&lt;p&gt;Chris lives in New York and I live in California and so we
both flew into Seattle and then drove out to Port Angeles
together, staying at an AirBNB about an hour from the
trailhead. Dawn was at around 5:15 and then dusk around 8:$5, so we
aimed to start around 5:45.&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grand-loop-prep.jpg&quot; alt=&quot;Photo of my stuff&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
My stuff ready to go.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;start-obstruction-point-%5B8.12-mi%2C-%2B2251%2F-1499-ft%2C-2%3A15%3A05%2C-2%3A15%3A05%2C-16%3A38%2Fmi%5D&quot;&gt;Start Obstruction Point [8.12 mi, +2251/-1499 ft, 2:15:05, 2:15:05, 16:38/mi] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grand-loop/#start-obstruction-point-%5B8.12-mi%2C-%2B2251%2F-1499-ft%2C-2%3A15%3A05%2C-2%3A15%3A05%2C-16%3A38%2Fmi%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This section went really well. It was nice and cold at the start
and despite there being (in retrospect, looking at the profile),
a surprising amount of climbing. Almost everything was runnable,
though we decided to hike the bigger climbs to avoid going
out too hard.&lt;/p&gt;
&lt;p&gt;As we were traversing the ridge line, we were able to see
and smell quite a bit of smoke over the valley (potentially from the
&lt;a href=&quot;https://inciweb.wildfire.gov/incident-news/waolf-bear-gulch-fire?page=0&quot;&gt;Waolf Bear Gulch Fire&lt;/a&gt;). We hadn&#39;t checked fire conditions
going in but ran into some backpackers and asked them about
it and they said they were aware of it but the fire was far
away, so the only issue was the smoke. To be honest, we were
a bit worried about it, but after flying into Washington State,
we weren&#39;t about to bail out 8 miles in.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grand-loop-smoke-obstruction.jpg&quot; alt=&quot;Smoke in the valley from the approach to Obstruction Point&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Smoke in the valley from the approach to Obstruction Point
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;obstruction-point-to-grand-pass-%5B6.33-mi%2C-%2B2106%2F-1867-ft%2C-2%3A05%3A53%2C-4%3A20%3A58%2C-19%3A53%2Fmi%5D&quot;&gt;Obstruction Point to Grand Pass [6.33 mi, +2106/-1867 ft, 2:05:53, 4:20:58, 19:53/mi] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grand-loop/#obstruction-point-to-grand-pass-%5B6.33-mi%2C-%2B2106%2F-1867-ft%2C-2%3A05%3A53%2C-4%3A20%3A58%2C-19%3A53%2Fmi%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The next leg is from Obstruction Point down to Grand Lake and then up
to Grand Pass. The section down to Grand Lake was still quite runnable
so we took it at a good pace. Grand Lake is actually the first water
source on the trail, so even though it&#39;s a bit of a detour, we went
almost all the way down to the lake until we ran into a well-running
stream, which works well with our gear.
We are using &lt;a href=&quot;https://www.salomon.com/en-us/product/soft-flask-xa-filter-490ml-16oz-42-lc10471#queryid=5ff15c429a66802efa6538c3c7947f57#indexUsed=prod_sln_us_en_products&quot;&gt;Salomon filter bottles&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grand-loop/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; which combine a soft flask
with a filter cap attached to the drinking nipple. You can
drink directly out of the flask or squeeze the bottle
into another bottle. The easiest way to fill up the flask is
if you have some running water which you can run right
into the flask. This is by contrast to old-style &lt;a href=&quot;https://www.katadyngroup.com/us/en/8018270-katadyn-hiker-microfilter-usa-dark-grey~p6722&quot;&gt;pump-based&lt;/a&gt;
water filters where you needed to have the pump intake completely
submerged and so running water wasn&#39;t that convenient.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grand-loop/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Of course it turns out the detour to the lake was totally unnecessary
because from here on in there were a lot of stream crossings
so we could have just filled up without leaving the trail.&lt;/p&gt;
&lt;p&gt;After the lake, we had the first big climb up 1600 ft. to Grand Pass
over 2.65 miles. This was a pretty straightforward climb on
dirt, gravel, and scree to the top of the pass and we were still feeling
nice and strong.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grand-loop-selfie.jpg&quot; alt=&quot;Selfie with Chris&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
A selfie en route to Grand Lake. You can tell we&#39;re cool because we&#39;re wearing sunglasses.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;grand-pass-to-cameron-pass-%5B5.34-mi%2C-%2B2405%2F-2326-ft%2C-2%3A17%3A49%2C-6%3A38%3A57%2C-25%3A50%2Fmi%5D&quot;&gt;Grand Pass to Cameron Pass [5.34 mi, +2405/-2326 ft, 2:17:49, 6:38:57, 25:50/mi] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grand-loop/#grand-pass-to-cameron-pass-%5B5.34-mi%2C-%2B2405%2F-2326-ft%2C-2%3A17%3A49%2C-6%3A38%3A57%2C-25%3A50%2Fmi%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next segment is where things started to get difficult. It&#39;s
listed on the map as &amp;quot;Cameron Pass Primitive Trail&amp;quot;, and lives
up to that reputation. The downhill from Grand Pass is quite
steep (about 2000 ft over less than two miles) and technical,
making it hard to run fast; then you turn around and start
the long climb to the top.&lt;/p&gt;
&lt;p&gt;Partway up this climb I started to have a bit of a low patch,
feeling a bit hungry and lightheaded. I&#39;d been sticking to my
nutrition schedule but it had mostly been liquid calories and
also I&#39;d substituted sports drink for water a few times when
I filled my bottles, so I think I just got a little behind and
my stomach was empty. I swapped in some solid food and loaded
up on salt and quickly started to feel better. This was the only
time I actually had any real trouble on the entire route, and
I felt pretty solid from here on in.&lt;/p&gt;
&lt;p&gt;At this point we&#39;d given back pretty much all of the time
we made up in the first leg and we&#39;re right on the
projected schedule for 14 hrs, but of course the terrain
didn&#39;t get much better, so we just kept falling further behind.&lt;/p&gt;
&lt;p&gt;At this point my feet were starting to hurt some, especially in my
ankles. Nothing too bad, but a little concerning at 20 odd miles in.&lt;/p&gt;
&lt;h2 id=&quot;cameron-pass-to-gray-wolf-pass-%5B9.61-mi%2C-%2B2867%2F-3150-ft%2C-3%3A49%3A45%2C-10%3A28%3A42%2C-23%3A55%2Fmi%5D&quot;&gt;Cameron Pass to Gray Wolf Pass [9.61 mi, +2867/-3150 ft, 3:49:45, 10:28:42, 23:55/mi] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grand-loop/#cameron-pass-to-gray-wolf-pass-%5B9.61-mi%2C-%2B2867%2F-3150-ft%2C-3%3A49%3A45%2C-10%3A28%3A42%2C-23%3A55%2Fmi%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next section was really more of the same: a long descent down
to the valley floor followed by a climb to Gray Wolf Pass. The footing
didn&#39;t really get much better from here, so we were doing quite
a bit of walking even on the downhill.
This section didn&#39;t seem too bad in terms of smoke but there must have
been some because it seemed to trigger my asthma and I found myself
coughing a bit.&lt;/p&gt;
&lt;p&gt;We hit the bottom of the trail to Gray Wolf Pass and each took our first Maurten GEL CAF to
give us some energy on the way up. The climb to the top of Gray Wolf is surprisingly long, and you can
see the peak from a long way up. The top mile or two is just on exposed
rock and sand, so there was a bit of &amp;quot;can it really be another half
mile&amp;quot;, but eventually we hit the top. There were a few backpackers
sitting at the top eating. We chatted with them and got the usual &amp;quot;are you really
doing it one day, wow&amp;quot;, response, and then it was time to head down.&lt;/p&gt;
&lt;h2 id=&quot;gray-wolf-pass-to-three-forks-climb-%5B9.31-mi%2C-%2B66%2F-4068-ft%2C-2%3A58%3A40%2C-13%3A27%3A22%2C-19%3A11%2Fmi%5D&quot;&gt;Gray Wolf Pass to Three Forks Climb [9.31 mi, +66/-4068 ft, 2:58:40, 13:27:22, 19:11/mi] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grand-loop/#gray-wolf-pass-to-three-forks-climb-%5B9.31-mi%2C-%2B66%2F-4068-ft%2C-2%3A58%3A40%2C-13%3A27%3A22%2C-19%3A11%2Fmi%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;By this point we were definitely starting to run behind and we didn&#39;t
think we&#39;d see 14 hrs, but we were expecting to pick up some time on
the 9 mile downhill and finish in the mid 14s. Unfortunately, this part
of the trail was really not that runnable at all. Coming off the ridge
it was the usual sand, gravel, and loose rock so we didn&#39;t go too fast and then
once we got to lower altitudes it was a lot of rocks and roots, as
well as stream crossings, mostly in the form of narrow. There were
also a lot of treefalls we had to climb over or under, though at least
the trail was clear so we didn&#39;t have trouble finding it once we got
past the treefall.&lt;/p&gt;
&lt;p&gt;This section had quite a few stream crossings, but unlike many
backcountry trails, they were really well maintained, with
actual bridges. Most of these were just a single log
that had been flattened on top, so you had to watch your
balance. I found myself thinking about my friend
Cullen, who used to say that it&#39;s easy to walk
a foot-wide path that&#39;s on the ground, but if you put
a a foot-wide plank 50 feet in the air, very few people
can do it. A few actually had a railing, which made
life a lot easier. Either way, though, it&#39;s a lot better
than having to hop across a bunch of rocks.&lt;/p&gt;
&lt;p&gt;As a result of all this, we had a really hard time getting into
a a rhythm; we&#39;d run a little and then have to walk, then run
some more, etc. This made for a really long 9 miles where you
had to be constantly paying attention to make sure you didn&#39;t
trip, and I found myself repeatedly looking at my watch to
see if we were finally close to the part where we could
start climbing.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grand-loop-gray-wolf.jpg&quot; alt=&quot;The terrain at the top of Gray Wolf&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The terrain at the top of Gray Wolf
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grand-loop-gray-wolf-view.jpg&quot; alt=&quot;The view from Gray Wolf&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The view from Gray Wolf
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;three-forks-climb-to-finish-%5B4.71-mi%2C-%2B3222%2F-13-ft%2C-1%3A47%3A30%2C-15%3A14%3A52%2C-22%3A50%2Fmi%5D&quot;&gt;Three Forks Climb to Finish [4.71 mi, +3222/-13 ft, 1:47:30, 15:14:52, 22:50/mi] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grand-loop/#three-forks-climb-to-finish-%5B4.71-mi%2C-%2B3222%2F-13-ft%2C-1%3A47%3A30%2C-15%3A14%3A52%2C-22%3A50%2Fmi%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finally, we made it to the bottom, popped another GEL CAF and
started up the Three Forks climb. This was the biggest climb of the day but also
one of the nicest parts, consisting of nice smooth shaded
single track. Of course it didn&#39;t hurt that we knew this was
the last thing we had to do. By this point my feet had started
to feel quite a bit better, probably from the easier trail.&lt;/p&gt;
&lt;p&gt;We made pretty good time up the climb: 22:50/mile isn&#39;t bad at all
for a 13% grade. Poles really help a lot under these conditions:
you don&#39;t need them for stabilization but they let you recruit
more of your body to drive you up the hill. It&#39;s just a matter
of putting your head down and keeping moving.&lt;/p&gt;
&lt;p&gt;About 1/3 of the mile from the top
were rewarded with the trail opening up into a
flat runnable section that took us all the way into
the finish. Even though we had been pushing up the climb
we still had plenty of gas in our legs and were able to
finish nice and strong.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/grand-loop-foot.jpg&quot; alt=&quot;My foot, which has inexplicably turned blue&quot; width=&quot;378&quot; /&gt;
&lt;figcaption&gt;
My foot, which has inexplicably turned blue. I thought it might be
bruised but I think it&#39;s just the dye from my shoe bleeding from
when I got my foot wet.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grand-loop/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Overall this outing went fairly well. This is a really beautiful route
that is simultaneously tough but also doable in a single day without
feeling too wrecked. The climbs are definitely long, especially
towards the end, but unlike some other mountain routes I&#39;ve seen there&#39;s
nothing where you&#39;re just death marching in the heat. It really helps
that there&#39;s plenty of water all along the route except for the first
12 miles to Grand Lake, but that section goes really fast. After that,
there were a few places we got low and were a little worried about
water availability, but never so much that we got desperate; it was
more a matter of convenience and quality of the source, as in
&amp;quot;should we fill up at this marginal stream or wait for something
better?&amp;quot;&lt;/p&gt;
&lt;p&gt;We finished on the high side of my
forecasts but I was more or less guessing anyway and 40 odd percent
slower than Max King isn&#39;t too shabby. Reading their FKT report, it
seems like they also had a really fast leg to Obstruction Point and
then slowed down a lot as well.&lt;/p&gt;
&lt;p&gt;Nutrition went well. I&#39;ve been experimenting with liquid-only
nutrition (Maurten 320 and Tailwind High Carb) but on previous
outings I started not to feel that great, which is consistent with
how I felt here. This time I alternated Maurten 160, Maurten 320,
and Tailwind High Carb and ate solid food when I was drinking
Maurten 160 and water. This seemed to work pretty well, but I think
in the future I may try to do more like Maurten 160 half the time
rather than about a third.&lt;/p&gt;
&lt;p&gt;Except for a few brief periods I
already mentioned, I felt good essentially the whole way and not too
wiped out at the end. I was pleased to be able to run comfortably for
the last bit. Next up, Chamonix and Grächen.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overall&lt;/strong&gt; 43.4 mi, 13015 ft, 15:14:52, 21:04/mi.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
He moved to NYC! &lt;a href=&quot;https://educatedguesswork.org/posts/grand-loop/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Hydrapak and Katadyn sell filter caps which are basically the same.
 &lt;a href=&quot;https://educatedguesswork.org/posts/grand-loop/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is actually the result of some cool new technology.
Older filters use either a paper or a ceramic filter
and require quite a bit of pressure to drive the water through.
Newer filters are based on &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Hollow_fiber_membrane&amp;amp;oldid=1280689048&quot;&gt;hollow fiber membranes&lt;/a&gt;,
which require a lot less water pressure. Instead of
a pump you can just have a flexible bag and squeeze
the water through the filter or even use a gravity feed.
 &lt;a href=&quot;https://educatedguesswork.org/posts/grand-loop/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding Memory Management, Part 7: Advanced Garbage Collection</title>
		<link href="https://educatedguesswork.org/posts/memory-management-7/"/>
		<updated>2025-07-27T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/memory-management-7/</id>
		<content type="html">&lt;script src=&quot;https://unpkg.com/ohm-js@17/dist/ohm.min.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;module&quot;&gt;
import { InterpreterWidget } from &quot;https://gcexplorer.net/js/interpreterwidget.mjs&quot;;

document.addEventListener(
&quot;DOMContentLoaded&quot;,
(async () =&gt; {
async function makeAw(args) {
   const container = document.querySelector(`#${args.elementId}`);
   const figure = document.createElement(&quot;figure&quot;);
   container.appendChild(figure);
   const interiorId = `${args.elementId}--internal`;
   const div = document.createElement(&quot;div&quot;);
   div.setAttribute(&quot;id&quot;, interiorId);
   figure.appendChild(div);
   args.elementId = interiorId;
   if (args.caption) {
     const caption = document.createElement(&quot;figcaption&quot;);
     caption.textContent = args.caption;
     figure.appendChild(caption);
   }
   await InterpreterWidget(args);
   
}

// Fix for scroll issue. Thanks claude!
// Store current scroll position
 const scrollTop = window.pageYOffset;
 const scrollLeft = window.pageXOffset;
 
 // Temporarily prevent scrolling
 const originalOverflow = document.body.style.overflow;
 document.body.style.overflow = &#39;hidden&#39;;
 
 // Also prevent focus-related scrolling
 const originalScrollIntoView = Element.prototype.scrollIntoView;
 Element.prototype.scrollIntoView = function() {};
 

await makeAw({elementId : &quot;generational-allocation1&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;generational&quot;,
   programUrl: &quot;/examples/memory-management-7/allocation.memo&quot;,
   setToLine: 3,
   caption: &quot;Initial allocation&quot;
});

await makeAw({elementId : &quot;generational-allocation2&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;generational&quot;,
   programUrl: &quot;/examples/memory-management-7/allocation.memo&quot;,
   setToLine: 7,
   caption: &quot;After a minor GC&quot;
});

await makeAw({elementId : &quot;generational-allocation3&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;generational&quot;,
   programUrl: &quot;/examples/memory-management-7/allocation.memo&quot;,
   setToLine: 8,
   caption: &quot;Allocation after minor GC&quot;
});

await makeAw({elementId : &quot;generational-allocation4&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;generational&quot;,
   programUrl: &quot;/examples/memory-management-7/allocation.memo&quot;,
   setToLine: 13,
   caption: &quot;Another minor GC&quot;
});

await makeAw({elementId : &quot;generational-allocation5&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;generational&quot;,
   programUrl: &quot;/examples/memory-management-7/allocation.memo&quot;,
   setToLine: 19,
   caption: &quot;After major GC&quot;
});

await makeAw({elementId : &quot;generational-allocation7&quot;, mode: &quot;layout&quot;,
   allocatorType : &quot;generational&quot;,
   programUrl: &quot;/examples/memory-management-7/allocation2.memo&quot;,
   setToLine: 8,
   caption: &quot;Downward (intergenerational) pointers&quot;
});

await makeAw({elementId : &quot;generational-allocation8&quot;, mode: &quot;transcript&quot;,
   allocatorType : &quot;generational&quot;,
   programUrl: &quot;/examples/memory-management-7/allocation2.memo&quot;,
   setToLine: 12,
   caption: &quot;Major GC example&quot;
});

// Re-enable scrolling and get back to the top.
document.body.style.overflow = originalOverflow;
Element.prototype.scrollIntoView = originalScrollIntoView;
       
window.scrollTo(scrollLeft, scrollTop);
})(),);

&lt;/script&gt;
&lt;link rel=&quot;stylesheet&quot; href=&quot;https://gcexplorer.net/css/interpreterwidget.css&quot; /&gt;
&lt;div class=&quot;newsletter-only&quot;&gt;
&lt;h2 id=&quot;attention%3A-read-this-post-on-the-web&quot;&gt;Attention: Read this post on the Web &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#attention%3A-read-this-post-on-the-web&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This post uses extensive client-side JavaScript and so won&#39;t
render properly in your mail client. You should read this
post on the &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6&quot;&gt;Web site&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;/div&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/gc-latency.jpg&quot; alt=&quot;GC latency is too damn high&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;This is the seventh and final (phew!) post in my multipart series on memory
management. You may want to go back and read Part
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;I&lt;/a&gt;, which covers C, parts
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2&quot;&gt;II&lt;/a&gt; and
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3&quot;&gt;III&lt;/a&gt;, which cover C++, and parts
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4&quot;&gt;IV&lt;/a&gt; and
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5&quot;&gt;V&lt;/a&gt; which cover Rust,
and if you haven&#39;t read it, go to read part &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6&quot;&gt;VI&lt;/a&gt;,
which introduces the basic mechanisms of garbage collection.
In this post, I want to touch on some of what you need to
do to deploy garbage collection in a production system, as well
as some of the reasons that systems designers might decide to
avoid GC.&lt;/p&gt;
&lt;h2 id=&quot;garbage-collection-latency&quot;&gt;Garbage Collection Latency &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#garbage-collection-latency&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The biggest concern that engineers typically have with garbage
collected systems is the cost of the garbage collector itself.
Because these algorithms—other than reference counting, of
course—require scanning all allocated memory, often multiple
times, they can be quite expensive.  This isn&#39;t just a matter of total
program runtime, but also of latency.  The basic GC algorithms I showed
in part [VI] are what&#39;s called &amp;quot;stop-the-world&amp;quot; garbage collectors,
which means that the entire program has to wait for the GC to finish.
This might not be a big deal if you&#39;re processing some data in
Python—what with buffering, context switching, etc. you may not
even notice the latency—but the situation is totally different
in an interactive application: when the program is garbage collecting
it&#39;s not responding to your input; in this context even very small
amounts of GC lag—or any other lag, for that matter—can be quite noticeable.&lt;/p&gt;
&lt;h3 id=&quot;garbage-collection-timing&quot;&gt;Garbage Collection Timing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#garbage-collection-timing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first thing to do to control GC latency is to be
fairly careful about when you actually garbage collect.
Which strategy you follow isn&#39;t that big a deal in non-interactive program, because
unless you do something severely wrong, you&#39;ll probably have to do
approximately the same total amount of GC anyway, but if you GC at the
wrong time in an interactive program then users will get annoyed
because suddenly their program just stops responding and they have to
sit and wait for it to come back. This is obviously quite annoying!
Back when I worked on Firefox, we used to call this kind of
stuff &amp;quot;jank&amp;quot;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;You might think that you want to put off GC as long as you could, for
instance until you actually are unable to allocate any more memory
without garbage collecting. This generally isn&#39;t the best idea: you
may run out of memory right at the time when the user is trying to do
something, which creates exactly the janky user experience that you
were trying to avoid by deferring GC.  Moreover, because GC algorithms
run more slowly when there is a lot of memory in use (counting garbage
here as &amp;quot;in use&amp;quot;), if you put things off too long, the latency may
actually be quite bad.  On the other hand, you don&#39;t want to GC too
frequently because GCing is expensive.&lt;/p&gt;
&lt;p&gt;There are a number of more sophisticated approaches for scheduling
the GC. For example, in an interactive program you can look for
when the program appears to have been idle (i.e., no computation
or user input) for a given period of time, as is done in
&lt;a href=&quot;https://elpa.gnu.org/packages/gcmh.html&quot;&gt;GCMH&lt;/a&gt;. V8&#39;s Orinoco
uses a &lt;a href=&quot;https://queue.acm.org/detail.cfm?id=2977741&quot;&gt;fancier version of this&lt;/a&gt; where it schedules GC during
the idle period after it has rendered a frame and the time
it needs to render the next one. Part of why this works
is that they do only part of the GC each time, as described
below. Modern GCs may also use multiple triggers for garbage collection.
For instance the Java ZGC collector uses memory pressure, periodic
collection, and allocation rate &lt;a href=&quot;https://dev.to/ryan_zhi/in-depth-study-of-zgc-z-garbage-collector-2lo&quot;&gt;as triggers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Good GC scheduling will only take you so far with a stop-the-world
GC; you&#39;re always going to take a pause and there&#39;s some chance
that it will be at an inconvenient time. If you really
want to minimize latency, you need to find a way to
avoid having invocation of the GC stall the entire program.
The remainder of this section describes a number of approaches.&lt;/p&gt;
&lt;h3 id=&quot;generational-garbage-collection&quot;&gt;Generational Garbage Collection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#generational-garbage-collection&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One of the most common mechanisms is to just garbage collect
&lt;em&gt;some&lt;/em&gt; of the objects you&#39;ve allocate, typically the most
recently allocated ones. To see why this makes sense, consider
the following simple &lt;strike&gt;Python&lt;/strike&gt; JS &lt;em&gt;[Corrected - 2025-07-27]&lt;/em&gt; program:&lt;/p&gt;
&lt;figure&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;code&quot;&gt;Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#code&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; fs &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;fs&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;WORD_COUNTS&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;LINE_NUMBER&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;MAX_WORDS&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; fileContent &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;readFileSync&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;count-words.in&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;utf-8&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; fileContent&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;// Remove trailing empty line if file ends with newline&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; line &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; words &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; line&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token regex&quot;&gt;&lt;span class=&quot;token regex-delimiter&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token regex-source language-regex&quot;&gt;&#92;s+&lt;/span&gt;&lt;span class=&quot;token regex-delimiter&quot;&gt;/&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; word&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; count &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; words&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;count &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;WORD_COUNTS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token constant&quot;&gt;WORD_COUNTS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;count&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token constant&quot;&gt;WORD_COUNTS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;count&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;LINE_NUMBER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token constant&quot;&gt;LINE_NUMBER&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token constant&quot;&gt;MAX_WORDS&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Math&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;MAX_WORDS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; count&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; count &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; count &lt;span class=&quot;token operator&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;MAX_WORDS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; count&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; countString &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;WORD_COUNTS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;count&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;WORD_COUNTS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;count&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;count&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;countString&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;output&quot;&gt;Output &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#output&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;0: 1 5 10 16 20
1: 13
2: 0 19
3: 2 3 4 6 8 9 12 14
4: 7 15 18
5: 11 17

&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;figcaption&gt;
JS word counter
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This program reads a file line by line and then makes
a structure containing of the line numbers of the lines
with various number of words per line.&lt;/p&gt;
&lt;p&gt;The thing to notice here is that there are two kinds of
object allocations here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The table of word lengths (&lt;code&gt;WORD_COUNTS&lt;/code&gt; and the individual
lists inside it)&lt;/li&gt;
&lt;li&gt;The line read from the file&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; stored in &lt;code&gt;line&lt;/code&gt;
list of words in each line created at the top of the loop
by &lt;code&gt;words = line.split()&lt;/code&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;real-programs-don&#39;t-call-free&quot;&gt;Real Programs Don&#39;t Call Free &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#real-programs-don&#39;t-call-free&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;When I worked with &lt;a href=&quot;https://www.precedia.com/AllanSchiffman.html&quot;&gt;Allan Schiffman&lt;/a&gt;
he used to say (IIRC, quoting Stanford Professor &lt;a href=&quot;https://profiles.stanford.edu/david-cheriton&quot;&gt;Dave Cheriton&lt;/a&gt;),
&amp;quot;real programs don&#39;t call free&amp;quot;. The idea was that there were two kinds
of programs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Short-running programs like compilers where returning memory
didn&#39;t help that much and it was easier to just allocate
and never free, and let the operating system clean up on
program exit.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Long-running programs which had to be careful about their
memory consumption and which therefore couldn&#39;t really trust
the system &lt;code&gt;malloc()&lt;/code&gt; and instead had to implement their
own custom allocators.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, computers have gotten a lot bigger since then and
&lt;code&gt;malloc&lt;/code&gt; has gotten a lot better so I suspect people are a lot
more willing to trust the allocator rather than rolling their
own. On the other hand, the whole point of garbage collection
is that you don&#39;t have to call free.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The overall &lt;code&gt;WORD_COUNTS&lt;/code&gt; table lasts for the entire lifetime
of the program, but the &lt;code&gt;line&lt;/code&gt; and &lt;code&gt;words&lt;/code&gt; variables only
live for one turn of the loop. This turns out to be a common
access pattern, especially for long-lived programs: you have
a lot of both short-lived and long-lived allocations. But this
means that when you garbage collect, you spend a lot of time
examining (tracing) over objects which will not be freed.
For example, imagine we modified the program to request
garbage collection on every turn of the loop (this is plainly
unnecessary in this program, but bear with me).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
With each turn we would free &lt;code&gt;line&lt;/code&gt; and &lt;code&gt;words&lt;/code&gt; but we would
also have to examine the (ever-increasing) &lt;code&gt;WORD_COUNT&lt;/code&gt; structure,
which is just wasted effort.&lt;/p&gt;
&lt;p&gt;If memory allocation and freeing patterns were random
(technically: distributed accordingly to a Poisson process),
then we would basically just have to live with this waste.
However, in fact allocation and freeing display
patterns, specifically that it&#39;s common for objects which
were just allocated to be freed quickly.
For example. if the user clicks on some button, the
various click handlers for the button, dialog boxes, etc. may get
instantiated to handle the UI gesture and then torn down as soon as
the program has finished processing the gesture.
This
observation is often summed up in what&#39;s called the
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Tracing_garbage_collection&amp;amp;oldid=1283536426#Generational_GC_(ephemeral_GC)&quot;&gt;Generational Hypothesis&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;the most recently created objects are also those most likely to become unreachable quickly&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We can exploit this observation to improve GC performance
using what&#39;s called a &amp;quot;generational garbage collector&amp;quot;.&lt;/p&gt;
&lt;p&gt;The intuition behind a generational GC is that we segregate objects
into &amp;quot;generations&amp;quot; and then garbage collect the younger generations
more frequently. As objects age, we move them into older generations,
which are garbage collected less frequently.  As an example, I have
implemented a simple GC, with only two generations. It behaves
as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The heap is divided into two regions, the &amp;quot;nursery&amp;quot; used for
newly created objects, and the rest of the heap, used for
older objects. The nursery starts at address &lt;code&gt;0&lt;/code&gt; as usual,
and the rest of the heap starts at address &lt;code&gt;5000&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When an object is initially created, it is allocated out of
the nursery.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We have two kinds of garbage collection: a &lt;em&gt;minor&lt;/em&gt; GC, which only examines objects in the nursery and doesn&#39;t
clean up objects in the rest of the heap and a &lt;em&gt;major&lt;/em&gt; GC, which garbage collects the whole heap.
Generally, we would do a minor GC fairly frequently and a major
GC less often.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;After we have done either kind of GC, any objects in the nursery
will be promoted to the rest of the heap (technical term: &lt;em&gt;tenured&lt;/em&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&#39;s walk through this piece-by-piece. First, we&#39;ll just allocate
two small objects.&lt;/p&gt;
&lt;div id=&quot;generational-allocation1&quot;&gt;&lt;/div&gt;
&lt;p&gt;The situation here is just the same as with the other GCs we&#39;ve
seen: the objects get created at the top of the heap in address &lt;code&gt;16&lt;/code&gt;.
The only difference is that now have labels for &lt;code&gt;Nursery&lt;/code&gt; and
&lt;code&gt;Tenured&lt;/code&gt;. For presentation purposes I&#39;ve drawn these as adjacent,
but of course these are both large regions; I&#39;m just eliding the
big empty space between the last the allocations and the rest
of the region.&lt;/p&gt;
&lt;p&gt;Now let&#39;s make &lt;code&gt;b&lt;/code&gt; garbage and then do a GC on line 4. Note that
I&#39;ve asked for a minor GC with the new &lt;code&gt;#gc0&lt;/code&gt; pseudoinstruction,
but as there are no tenured objects the eventual result is much
the same either way.&lt;/p&gt;
&lt;div id=&quot;generational-allocation2&quot;&gt;&lt;/div&gt;
&lt;p&gt;After this GC pass, object &lt;code&gt;b&lt;/code&gt; has been collected and the object
pointed to by &lt;code&gt;a&lt;/code&gt; has been tenured and moved to address &lt;code&gt;5000&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If we now create a new object and assign it to &lt;code&gt;b&lt;/code&gt;, it will
be allocated in the nursery, as expected.&lt;/p&gt;
&lt;div id=&quot;generational-allocation3&quot;&gt;&lt;/div&gt;
&lt;p&gt;Now let&#39;s make &lt;code&gt;a&lt;/code&gt; garbage, and do a minor GC.&lt;/p&gt;
&lt;div id=&quot;generational-allocation4&quot;&gt;&lt;/div&gt;
&lt;p&gt;Now we&#39;ve tenured &lt;code&gt;b&lt;/code&gt; but the object at &lt;code&gt;5000&lt;/code&gt;
previously pointed to by &lt;code&gt;a&lt;/code&gt; is still there; it&#39;s just
garbage. This is what we expected, because a minor GC
doesn&#39;t try to clean up any of the tenured objects;
it just lets them sit there even if they&#39;re garbage.
If we now ask for a major GC, we&#39;ll clean up the whole
heap, and finally clean up that object.&lt;/p&gt;
&lt;div id=&quot;generational-allocation5&quot;&gt;&lt;/div&gt;
&lt;p&gt;The advantages of this design should be obvious,
assuming the generational hypothesis is true for
our program: we can clean up most of the garbage
by just examining the nursery without having to look
at any tenured objects. Generational style GCs
are very common, especially in systems designed for
interactive programs. Many modern GCs, such as those used by
&lt;a href=&quot;https://v8.dev/blog/trash-talk&quot;&gt;V8&lt;/a&gt;,
&lt;a href=&quot;https://firefox-source-docs.mozilla.org/js/gc.html&quot;&gt;SpiderMonkey&lt;/a&gt;, or
&lt;a href=&quot;https://wiki.openjdk.org/display/zgc/Main&quot;&gt;Java&lt;/a&gt; use generation
scavenging.&lt;/p&gt;
&lt;h4 id=&quot;design-choices&quot;&gt;Design Choices &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#design-choices&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;A generational GC combines elements of a number of the garbage
collection systems we&#39;ve seen already.&lt;/p&gt;
&lt;h5 id=&quot;allocation&quot;&gt;Allocation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#allocation&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;In our implementation, we just use a simple bump allocator,
always allocating out of the nursery. Because every GC
pass always cleans out the entire nursery, promoting live
objects and discarding dead ones, there are never any holes
in the nursery that previously contained live objects, so
there&#39;s no point in trying to reuse space--there&#39;s nothing
to reuse. In a fancier GC (see below), we might have a
different situation, however.&lt;/p&gt;
&lt;h5 id=&quot;minor-gc&quot;&gt;Minor GC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#minor-gc&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;In the specific design
I&#39;ve implemented, the minor GC is effectively a copying style
collector but instead of copying into two equivalent semi-spaces,
we copy from the nursery right into the tenured region of the
heap. Importantly, unlike the copying collector we showed
in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#copying-garbage-collectors&quot;&gt;Part VI&lt;/a&gt;,
the destination region is probably not empty, because it will
contain whatever objects have already been tenured. Just as
with the copying GC, we can abandon all the objects in the
nursery after the GC.&lt;/p&gt;
&lt;p&gt;We&#39;re able to get away with this simple
strategy because we immediately promote objects on every GC
pass.
However, if we waited multiple passes, then the situation
would be different. For instance, if you promoted objects
after two GC passes—note that this requires bookkeeping—then
you might have objects which needed to be kept around in the
nursery after a GC pass. This also has implications for allocation:
if the minor GC uses mark-sweep, then we might have holes
that could then be filled rather than just bump allocating
(though it&#39;s still very attractive to bump allocate).&lt;/p&gt;
&lt;p&gt;V8&#39;s Orinoco uses an interesting design which has a copying minor GC
and then promotes objects after they have survived two passes:&lt;/p&gt;
&lt;figure&gt;
&lt;img alt=&quot;&quot; src=&quot;https://educatedguesswork.org/img/orinoco-minor-gc.svg&quot; width=&quot;800&quot; /&gt;
&lt;figcaption&gt;
&lt;p&gt;Orinoco&#39;s minor GC (generation scavenger). From &lt;a href=&quot;https://v8.dev/blog/trash-talk&quot;&gt;Google&lt;/a&gt;.&lt;/p&gt;

&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;There are a few subtle implementation details, which we&#39;ll
get to below.&lt;/p&gt;
&lt;h5 id=&quot;major-gc&quot;&gt;Major GC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#major-gc&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;The major GC is more or less the same as the stop-the-world
GCs we saw in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6&quot;&gt;Part VI&lt;/a&gt;, except that it
has to scan all the regions in sequence (in this case, just the
nursery and tenured regions). However, as a practical matter
you probably want the major GC to be either a mark-compact
or copying collector, because otherwise you end up with holes
in the tenured region which can&#39;t be filled in during the
normal allocation process. You could in principle fill them
in to some extent when you promote objects from the nursery,
but that makes GC much more expensive because you have to
try to fit each object being promoted somewhere rather
than just bump allocating. I use mark-compact because otherwise
you need to allocate a large block of memory for the other
semispace in order to optimize the infrequent major GC.&lt;/p&gt;
&lt;h4 id=&quot;implementation-notes&quot;&gt;Implementation Notes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#implementation-notes&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In this section we look a little more closely at the implementation
of our generational GC. Much of this will be familiar from Part VI,
but there are some details that I&#39;ve had to change.&lt;/p&gt;
&lt;h5 id=&quot;regions&quot;&gt;Regions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#regions&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;First, we need to keep track of the which regions of the
heap correspond to each generation. This is done by keeping
a &lt;code&gt;_generations&lt;/code&gt; list, with each entry containing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;start&lt;/code&gt;
: The first address that can be allocated at.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;end&lt;/code&gt;
: The end of the allocated region (one byte past the end of the
last object), and hence the next location we&#39;ll be allocating
at.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;top&lt;/code&gt;
: The end of the region itself (again, one byte past it).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For convenience, we also have the variables
&lt;code&gt;_gen0&lt;/code&gt; and &lt;code&gt;gen1&lt;/code&gt;, which point directly to the generations objects
for the nursery and tenured objects respectively.&lt;/p&gt;
&lt;p&gt;We can then use the &lt;code&gt;_generations&lt;/code&gt; variable to see which generation
a given pointer points, in the obvious way:&lt;/p&gt;
&lt;pre class=&quot;language-javascript&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;generationFromPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;address&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;index&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; generation&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_generations&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;entries&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;address &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; generation&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;top&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; index&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;throw&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Pointer not in any generation&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At some level, this is all overly general: this design
in principle supports an arbitrary number of generations but in
practice we just use two generations of equal size. There are
different programming philosophies here and exponents of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=You_aren%27t_gonna_need_it&amp;amp;oldid=1281815292&quot;&gt;YAGNI&lt;/a&gt;
would probably say this is a bad set of design tradeoffs,
but I prefer to avoid baking in a bunch of temporary
design decisions. In a real system we would probably want
the nursery to be smaller and this lets us make that change
if we decide to.&lt;/p&gt;
&lt;h5 id=&quot;minor-gc-2&quot;&gt;Minor GC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#minor-gc-2&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;Now let&#39;s take a closer look at the minor GC. As I said, this is
similar to the copying GC we showed in Part VI, but with
some subtle differences. Here&#39;s &lt;code&gt;process_ptr&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;process_ptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;address&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isMarked&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// The extra word is used for the forwarding address&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;readXWord&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Not moved yet.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; new_address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_gen1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;end&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    Memory&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;memmove&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; new_address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_gen1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;end &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Now overwrite the first word to point to the new location.&lt;/span&gt;&lt;br /&gt;    ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setXword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; new_address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mark&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;new_address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The only real difference here is that when we move an object we store
the new address not in the first word but in the extra word which
we&#39;ve allocated for the mark-compact &lt;code&gt;moved&lt;/code&gt; pointer. We could do this
either way, but this is cleaner and lets us be consistent between the
passes.&lt;/p&gt;
&lt;p&gt;The actual copying phase is essentially identical to our copying
GC, except that we have slightly different bookkeeping to deal
with the fact that we&#39;re copying into the tenured region
rather than a freshly initialized semispace. However, we have
two subtle issues to address, dealing with what&#39;s called
&lt;em&gt;intergenerational pointers&lt;/em&gt;, which is to say pointers
which are stored in an object of one generation but point
to an object of another generation. There are two types of
such pointers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;upward&lt;/em&gt; pointers, which go from new to old objects&lt;/li&gt;
&lt;li&gt;&lt;em&gt;downward&lt;/em&gt; pointers, which go from old to new objects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Upward pointers are easy to deal with: we don&#39;t want to
examine tenured objects at all, so when we see them during
the minor GC, we can just skip past them. This is fine
because all they can do is keep tenured objects alive,
and we&#39;re not going to free those objects anyway.&lt;/p&gt;
&lt;p&gt;Downward pointers are more complicated: because we aren&#39;t
tracing tenured objects, we&#39;re not going to see them, but
they may be the only reference to some object in the nursery,
so without them we would incorrectly free that object,
leading to a dangling pointer from a tenured object and eventually
maybe a UAF. This means we need to do something when
we have a situation like this:&lt;/p&gt;
&lt;div id=&quot;generational-allocation7&quot;&gt;&lt;/div&gt;
&lt;p&gt;The standard procedure is to keep track of all downward
pointers using what&#39;s called a &amp;quot;write barrier&amp;quot;. Whenever
we do a write to a pointer slot in an object, we check to
see if it&#39;s a downward pointer and if so, we store it in
a lookaside list, like so:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;recordIntergenerationalWrite&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;sourceAddress&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fieldIndex&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; targetAddress&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; key &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;sourceAddress&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt; &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;fieldIndex&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Check if the new target is an intergenerational pointer from old to young (gen0).&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;targetAddress&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; targetAddress &lt;span class=&quot;token operator&quot;&gt;!==&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL_POINTER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; sourceGenIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;generationFromPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;sourceAddress&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; targetGenIndex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;generationFromPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;targetAddress&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;sourceGenIndex &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; targetGenIndex &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// It&#39;s an intergenerational pointer, record it.&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_rememberedSet&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; targetAddress&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;key &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_rememberedSet&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_rememberedSet&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;From the perspective of the GC, the downward pointers
are just a new set of roots, so we add them to the
root list before doing the minor GC:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;_gc_minor_incremental&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;roots&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; inner_roots &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt;roots&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; ig_ptr_addrs &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Append the intergenerational pointers so that&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// we can save them.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ptr&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; Object&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;entries&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_rememberedSet&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      ig_ptr_addrs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      inner_roots&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; new_roots &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;yield&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;_gc_minor_incremental_inner&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;inner_roots&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With some careful refactoring &lt;code&gt;._gc_minor_incremental_inner()&lt;/code&gt;
could be the &lt;code&gt;._gc_incremental()&lt;/code&gt; function from our copying
GC, but I&#39;m trying to keep things a bit simple.&lt;/p&gt;
&lt;p&gt;When the minor GC returns, it provides updated values
for all the roots we passed in, so now we need to
patch up the locations where the downward pointers
were stored so that they point to the new locations
in the tenured region. Note that at the end of this process
we don&#39;t have anything in the nursery and so there will
be no more downward pointers.&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;    &lt;span class=&quot;token comment&quot;&gt;// Now update the IG pointers.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; new_ig_ptrs &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; new_roots&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;roots&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; i &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; new_ig_ptrs&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; new_addr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; new_ig_ptrs&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; key &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ig_ptr_addrs&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;addr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; slot&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; key&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;parseInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;a&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// Note that this will call recordIntergenerationalPointer()&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// but |new_addr| is now of the same generation as |addr|.&lt;/span&gt;&lt;br /&gt;      ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setValue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; addr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; slot&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; new_addr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4 id=&quot;major-gc-2&quot;&gt;Major GC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#major-gc-2&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The major GC really just is mark-compact, with the additional
complication that we need to iterate over all the regions
for each generation. However the in-use versions of each
region aren&#39;t contiguous. For instance, we might have only
allocated &lt;code&gt;16&lt;/code&gt;–&lt;code&gt;512&lt;/code&gt; in the nursery, which means that
&lt;code&gt;512&lt;/code&gt;–&lt;code&gt;4999&lt;/code&gt; is just unallocated blank space which
might have any values (e.g., if we previously had allocated
past &lt;code&gt;512&lt;/code&gt; and then did a GC), and we don&#39;t want to try to
scan it. This wasn&#39;t a problem in the minor GC because
a copying GC just follows pointers, but mark-compact actually
does a linear scan of the whole allocated region, so if
there are garbage values there, it will misbehave, quite
likely in a dangerous fashion. Fortunately, we already
have a list of the allocated subregions of each region,
so we can just iterate over it, as in the following:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_generations&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; generation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_generations&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; generation&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; generation&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;end&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isMarked&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setXword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; free_ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;setmoved&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;addr&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;newval&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; free_ptr &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;          free_ptr &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;        scan &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Straightforward, right? Well, mostly. Why are we going
through the list of generation regions backwards
(from older to younger generations and from high to low memory)
rather than forwards (from younger to older generations and
from low to high memory)?&lt;/p&gt;
&lt;p&gt;Recall that mark-compact works by sliding every object
as far left (towards low memory) as possible. It does
this by keeping a single &lt;code&gt;end&lt;/code&gt; pointer which points to
the location where the next object will be allocated
(initially pointing to the start of the target region)
and then incrementing it by the size of each object
allocated. This is normally safe because we are also
&lt;em&gt;scanning&lt;/em&gt; left to right and so we never overwrite
any region of memory we are going to need later, even
if we leave it in a corrupt state, as shown in the figure
below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sliding-left-normal.png&quot; alt=&quot;Sliding left&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Sliding left
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;What&#39;s in &lt;code&gt;28&lt;/code&gt;–&lt;code&gt;36&lt;/code&gt; is probably the last two
words of the object previously at &lt;code&gt;24&lt;/code&gt; (though
the GC could have done anything it wanted with it).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
It doesn&#39;t matter, though, because we&#39;ve already advanced
the scan pointer past that region to &lt;code&gt;40&lt;/code&gt;, so it&#39;s just going to
copy whatever is at &lt;code&gt;40&lt;/code&gt; over it in a second anyway.&lt;/p&gt;
&lt;p&gt;Now consider what happens if we have two generations
and we process the nursery first, as shown below.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sliding-left-bad.png&quot; alt=&quot;Sliding left&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Sliding left badly
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In this case, we copy
the first value in the nursery &lt;em&gt;over&lt;/em&gt; the first value
in the tenured region (remember, the tenured region
is always compacted); We&#39;ve now destroyed that object.
Even in the best case where the object was going to be
freed anyway, we&#39;ve quite likely left a piece of some
object—as shown here—which will mess
up the scanning process. And if we&#39;re not going to free
the object we just stomped on, the data is just gone and
now things are really bad.&lt;/p&gt;
&lt;p&gt;Fortunately, this problem is easily solved by
processing the generations in reverse order so that
we&#39;ve already processed everything in the tenured
region before we start trying to copy stuff from the
nursery into it.&lt;/p&gt;
&lt;p&gt;Below I&#39;ve provided a widget that lets you see the major
GC in action.&lt;/p&gt;
&lt;div id=&quot;generational-allocation8&quot;&gt;&lt;/div&gt;
&lt;h3 id=&quot;incremental-and-concurrent-gc&quot;&gt;Incremental and Concurrent GC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#incremental-and-concurrent-gc&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Another way to reduce the latency of garbage collection is to do
&lt;em&gt;incremental&lt;/em&gt; garbage collection, in which you only do part of the
garbage collection pass at each pause point.  The obvious challenge
here is that the program itself may change some pointers in between
the GC phases. For example, the topology where &lt;code&gt;A&lt;/code&gt; points to &lt;code&gt;B&lt;/code&gt; which
points to &lt;code&gt;C&lt;/code&gt;, and the following sequence of operations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start the GC and process &lt;code&gt;C&lt;/code&gt;, marking &lt;code&gt;B&lt;/code&gt; and adding it to the
work queue.&lt;/li&gt;
&lt;li&gt;The program then sets a pointer from &lt;code&gt;A&lt;/code&gt; → &lt;code&gt;C&lt;/code&gt; and
erases the pointer from &lt;code&gt;B&lt;/code&gt; → &lt;code&gt;C&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The next GC phase runs, processing &lt;code&gt;B&lt;/code&gt;, which completes the marking
phase. At this point &lt;code&gt;C&lt;/code&gt; is now orphaned and will eventually
be cleaned up during the sweep phase.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This problem can be solved with a similar &amp;quot;write barrier&amp;quot; approach to
how we handled intergenerational pointers, which detects
that you have changed something important from underneath the GC.  For
instance, you could detect that you&#39;ve changed one of the pointers
from an object you&#39;ve already processed (&lt;code&gt;A&lt;/code&gt;) and re-add it to the
work queue.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;  With
the right set of barriers you can intersperse operation of the
program and the GC at relatively fine granularity by running
the program for a little while, doing a little bit of GC, then
running the program some more, etc., thus reducing
the apparent pause experienced by the user.&lt;/p&gt;
&lt;p&gt;Incremental GC alone lets us reduce apparent pauses, but it doesn&#39;t let
us use more than one core at once, but of course modern processors
have  more than one core. It&#39;s possible to extend incremental GC to actually have the GC run in
parallel with the program, in what&#39;s called &lt;em&gt;concurrent&lt;/em&gt; GC. This
is obviously even trickier because we have to worry about the usual
thread safety concerns when two threads try to work on the same memory
at once, but the result is to further reduce the pauses
experienced by the user—though not necessarily to zero
because you still can have the program waiting on some contested
resource that the GC is using. For this reason concurrent GCs are very
common, including in the systems I named in the previous section.&lt;/p&gt;
&lt;p&gt;Separately, you can also have the GC run in multiple threads at
once. This is called &lt;em&gt;parallel&lt;/em&gt; GC. The obvious advantage here is
that you are using more than one core for your GC and thus making
more efficient use of your computer. You can use various barriers
here, but as an intuition pump consider
that you could have a non-concurrent mark-sweep GC that just ran the sweep phase
in parallel; this can be done without any thread locking at all
once you&#39;ve segmented the heap. You can have have parallel
GC in both stop-the-world and concurrent modes, with the latter
obviously providing the lowest latency impact.&lt;/p&gt;
&lt;h2 id=&quot;allocation-strategies&quot;&gt;Allocation Strategies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#allocation-strategies&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So far we&#39;ve looked at two basic allocation strategies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Bump allocate at the end of the allocated region&lt;/li&gt;
&lt;li&gt;Allocation in the first available free region (what&#39;s called &amp;quot;first-fit&amp;quot;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we have a moving GC like mark-compact or copying, then quite
possibly all we need is bump allocation; while there may be garbage in
the form of unreachable objects, but at least in a stop-the-world
GC, we discover the garbage and compact the heap at the same
time, so there&#39;s never any need to allocate in the holes.
By contrast, if we are using reference counting or mark-sweep,
then we do need to re-allocate out of the holes—otherwise
there&#39;s not much point in GCing in the first place—and
it&#39;s important to do so efficiently.&lt;/p&gt;
&lt;p&gt;Any allocation algorithm needs to balance a number of factors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fragmentation&lt;/li&gt;
&lt;li&gt;Efficiently finding a hole we can fit into&lt;/li&gt;
&lt;li&gt;Space overhead&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The design of allocators is a whole separate specialty from the design
of the garbage collection system, and I&#39;m not going to go into it here
in detail, but I wanted to give you a flavor of the kinds of issues
you face and some of the variety of implementation options.&lt;/p&gt;
&lt;p&gt;In the naive first-fit algorithm, we scan from the beginning of the
heap until we find a suitably large location. Depending on the
fragmentation pattern and the size of the new object, this can take
quite a long time and involve scanning through much if not all of the
heap. Moreover, this can make fragmentation worse; there may be a
better location—more closely matching in size—higher up in
the heap but instead we allocate in the first available location. An
alternate choice is to do what&#39;s called &amp;quot;best-fit&amp;quot; where we iterate
over all the available memory locations to find the ideal location,
but this requires scanning the entire heap, which is obviously
slower.&lt;/p&gt;
&lt;p&gt;There are a number of designs that allow us to have more efficient
allocation, but typically at the cost of memory overhead.&lt;/p&gt;
&lt;h3 id=&quot;bucketed-allocations&quot;&gt;Bucketed Allocations &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#bucketed-allocations&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There are of course a lot of options here, but one natural thing to do
is to group allocations into size buckets (for instance by powers of
two). When you allocate a new object of size &lt;code&gt;s&lt;/code&gt; you round the
allocation up to the next bucket size &lt;code&gt;B(s)&lt;/code&gt;, with the remainder of the
allocation just being left empty. Similarly, when you free an
allocation of size &lt;code&gt;s&lt;/code&gt; it creates a hole of size &lt;code&gt;B(s)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Bucketing allocations like this gives us a compromise between
first fit and best fit. For instance, if we first select the
first hole with the same bucket size, then actually we&#39;ll
be selecting any hole within the bucket&#39;s range, which are
presumably more common than exact matches. Moreover, if we
are using powers of two bucket sizes, we can &lt;em&gt;also&lt;/em&gt; select
a any hole of the next bucket size up by splitting the original
bucket in two. For instance, if we are allocating a region
of size 512 but have a hole of 1024, then the result is
an allocated region and a remaining hole of size 512.&lt;/p&gt;
&lt;p&gt;Another advantage of bucketed allocations is that it allows
you to easily resize objects. For example, if you allocate an object
of size 300 in a bucket of size 512, you can increase the
size of the object (up to 512, at least) without moving the
object. This is more useful for some kinds of objects than
others; for instance if you have a vector/array of objects
then it&#39;s often quite useful to be able to make it bigger
so you can add more entries.&lt;/p&gt;
&lt;h3 id=&quot;metadata-structure&quot;&gt;Metadata Structure &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#metadata-structure&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One of the reasons we&#39;ve had poor efficiency so far is that we&#39;ve been
forced to scan the heap linearly to find a suitable location,
so we end up with what&#39;s basically an &lt;code&gt;O(n)&lt;/code&gt; algorithm.
We need to do this because the only information we have
about the memory map is in the heap itself (which, recall,
is self-describing). We can do a lot better if we use
some extra memory to store a map of which regions are
in use and which are free (a &amp;quot;free list&amp;quot;).&lt;/p&gt;
&lt;p&gt;If we combine this idea with bucket allocation, then the
obvious approach is to have a list of holes indexed by
bucket size. When we free a region, we add the hole to
the list for that size. When we need to allocate a region, we look
to see if there are any holes of the right size. If so,
we allocate from one of those holes. If not, we bump
allocate in the usual fashion.&lt;/p&gt;
&lt;h3 id=&quot;multiple-arenas&quot;&gt;Multiple Arenas &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#multiple-arenas&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s also possible to have different heap regions (sometimes called
&amp;quot;arenas&amp;quot;) for different kinds of objects. For example, instead
of bucketing allocations but allocating out of the same region
you could have one region for each bucket size. This has the
advantage that holes are always the same size, so as long
as there is room in the region you can always allocate, but at
the cost of using up memory for the unused space for each
bucket size.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
It&#39;s also possible for different arenas to have different allocation
and GC strategies. For example, you might use a generation scavenging
GC for your small objects but a mark-sweep GC for large objects
or growable objects like arrays on the theory that they tend
to be long-lived and that copying them during the promotion phase
will be expensive.&lt;/p&gt;
&lt;h2 id=&quot;gc-in-the-real-world&quot;&gt;GC in the Real World &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#gc-in-the-real-world&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As you should have gathered by now, real world GC systems are vastly
more complicated than I&#39;ve described here. However, pretty much
all of them use some variation or combination of these basic techniques.
For example, here&#39;s how Google describes Orinoco:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Over the past years the V8 garbage collector (GC) has changed a
lot. The Orinoco project has taken a sequential, stop-the-world
garbage collector and transformed it into a mostly parallel and
concurrent collector with incremental fallback.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And here is a description of Java&#39;s ZGC:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Concurrent
Region-based
Compacting
NUMA-aware
Using colored pointers
Using load barriers
Using store barriers (in the generational mode)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Most of these words should seem familiar to you, and you should have
enough background to figure out the other ones with a little searching.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The design of fast garbage collectors is a whole subspecialty of CS,
and often requires a good understanding of the specific dynamics
of the system that the GC will be deployed in. If you want to
go (much) deeper here, a good reference is the
&lt;a href=&quot;https://gchandbook.org/&quot;&gt;Garbage Collection Handbook&lt;/a&gt;,
which I used extensively in preparing this and previous
post. It&#39;s fairly dense, but really digs into far more detail than you are
likely to need. In addition, many widely used GCs are open
source, so you can look at the code yourself.&lt;/p&gt;
&lt;h2 id=&quot;why-wouldn&#39;t-you-want-gc%3F&quot;&gt;Why wouldn&#39;t you want GC? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#why-wouldn&#39;t-you-want-gc%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As should be apparent at this point, working in a garbage-collected
system is vastly easier than working in a system where you have to
handle memory management yourself. In the latter case, you either end
up with a system that isn&#39;t safe by default (C++) or one where you
have to do a lot of gymnastics in order to write ordinary-seeming code
(Rust); in either case you spend a lot of time thinking about memory
management. By contrast, with a GC language, you rarely have to think
about what&#39;s going on with memory management at all, and when you do,
it&#39;s mostly around performance reasons or when you have to do deep
copies, rather than around &amp;quot;this bug is going to cause a
vulnerability&amp;quot; or &amp;quot;I can&#39;t make my program work&amp;quot;. This is true even
for a systems programming language like Go.&lt;/p&gt;
&lt;p&gt;Obviously, the people who designed Rust were aware of Garbage
collection—all of the main techniques date back to the 80s and
90s and the basic concepts go back to the 1950s and Lisp—but
they very deliberately chose to build Rust without it. There
are a number of reasons why you might want to build your system
without a GC.&lt;/p&gt;
&lt;h3 id=&quot;deterministic-performance&quot;&gt;Deterministic Performance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#deterministic-performance&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As noted above, because the GC—at least tracing GC, but recall
that most GC-based systems have some element of tracing—can
introduce pauses, most GC-based languages do not provide completely
deterministic performance.  This is the kind of thing that makes
systems programmers sad—though perhaps more than really
necessary—as they like to think of themselves as close to the
metal and knowing exactly what their program will do. By contrast, a
language like Rust will have more deterministic performance (which
isn&#39;t to say better) because you don&#39;t have to worry about the GC
doing something when you want to use the CPU.&lt;/p&gt;
&lt;p&gt;This isn&#39;t to say that you can&#39;t have a systems programming language
that does GC. Java, Go, and C# are all garbage collected and people
seem to get along just fine, but even so there is a persistent
sense amongst systems programmers that there&#39;s something fishy about it.&lt;/p&gt;
&lt;p&gt;Of course, it&#39;s important to recognize that Rust &lt;em&gt;does&lt;/em&gt; have GC in the
form of reference counting for &lt;code&gt;Rc&lt;/code&gt; and &lt;code&gt;Arc&lt;/code&gt;, and many if not most
Rust programs use some form of reference counted pointer at least some
of the time. However, many Rust fans have managed to conveniently
forget that reference counting is a form of GC because one of the main
differences between Rust and Go is that Go is GCed (which is &lt;em&gt;very bad&lt;/em&gt;)
and Rust is not.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ref-counting-is-gc.jpg&quot; alt=&quot;Reference counting is a kind of garbage collection&quot; /&gt;&lt;/p&gt;
&lt;h3 id=&quot;deterministic-behavior&quot;&gt;Deterministic Behavior &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#deterministic-behavior&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As you&#39;ll recall from previous posts, many languages support some
mechanism (C++ destructors, Rust drop trait, etc.) where an object
gets to do something before it&#39;s destroyed. We used this in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3&quot;&gt;III&lt;/a&gt;
to create C++ smart pointers, but you can also do other stuff,
such as flush file buffers. Some garbage collected languages have
similar features (often called &lt;a href=&quot;https://pkg.go.dev/runtime#SetFinalizer&quot;&gt;finalizers&lt;/a&gt;
or &lt;a href=&quot;https://pkg.go.dev/runtime#AddCleanup&quot;&gt;cleanup functions&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;However, because the destructor/finalizer/cleanup function runs when the object
is destroyed, and tracing garbage collectors destroy the object
at some indeterminate time, the finalizer also runs at an indeterminate
time, or potentially never. For instance, here&#39;s what Go&#39;s documentation
has to say:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AddCleanup attaches a cleanup function to ptr. Some time after ptr is no longer reachable, the runtime will call cleanup(arg) in a separate goroutine.&lt;/p&gt;
&lt;p&gt;..&lt;/p&gt;
&lt;p&gt;The cleanup(arg) call is not always guaranteed to run; in particular it is not guaranteed to run before program exit.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Reassuring, right?&lt;/p&gt;
&lt;p&gt;By contrast in a reference-counted system, the object is destroyed
at a deterministic time (when the reference count goes to zero) and
so you can have higher confidence about where it will run.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;what-about-gc-for-c%3F&quot;&gt;What about GC for C? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#what-about-gc-for-c%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I&#39;ve spent much of this series dumping on C and C++ for how hard they
make memory management, so it&#39;s natural to ask &amp;quot;why can&#39;t they be
garbage collected?&amp;quot; The answer is: they can be, at least sort of.
Back in 1988, Boehm, Demers, and Weiser designed a &lt;a href=&quot;https://www.hboehm.info/gc/&quot;&gt;GC system for C
and C++&lt;/a&gt;. The basic idea is you (the programmer)
use &lt;code&gt;GC_malloc()&lt;/code&gt; and &lt;code&gt;GC_realloc()&lt;/code&gt;, but never call free, because
the GC does it for you.&lt;/p&gt;
&lt;p&gt;The trick that makes this work is that it&#39;s a &lt;em&gt;conservative&lt;/em&gt; GC. Recall
that in all the examples above, we relied on knowing the structure
of objects so we can find all the pointers. The Boehm GC doesn&#39;t have
this information and so it has to assume that any pattern in memory
that points anywhere in an allocated object is actually a pointer to
that object.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
This means that (for instance) if you have an integer value which
just happens to have the same bit pattern as a pointer into an
object, then it will be treated as a pointer to that object,
which will then effectively leak.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible under certain circumstances for the garbage
collector to fail to recognize a pointer. It&#39;s generally not
legal in standards compliant C code to generate a pointer
value outside of the allocated region,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
but the compiler is allowed to do anything it wants, and so with
the right compiler optimizations, it&#39;s possible that the right
pattern won&#39;t appear in memory. See Boehm&#39;s &lt;a href=&quot;https://www.hboehm.info/gc/issues.html&quot;&gt;issues&lt;/a&gt;
page for more on this, though this looks kind of old so I wonder if
compilers have gotten more aggressive in the time since this was
written.&lt;/p&gt;
&lt;p&gt;This is obviously clever stuff, but my experience is that very few
C or C++ programs use this kind of garbage collection; people just
suffer in the traditions of our ancestors.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-7/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This brings us to the end of our series on memory management. In
closing, I&#39;d like to step back from the details and look at the big
picture.  As should be clear, a huge amount of the work performed by a
piece of software is in managing memory in one way or another: that
work can either be pushed onto the programmer or handled by the
software runtime automatically.&lt;/p&gt;
&lt;p&gt;Each approach has advantages and disadvantages: having the programmer
manage memory gives them tight control of the precise behavior of
the program, but at the cost of adding to the programmer&#39;s cognitive
burden, reducing the attention they have to pay to other pieces
of the programming task. By contrast, letting the language runtime
handle memory management frees up the programmer to think about
other things but at the cost of losing control of the precise
memory behavior of the program. This is a familiar tradeoff in
software engineering as you climb the ladder of tool sophistication:
you can get more done if you hand things off to the language—for
instance, using C rather than assembly, or using R or Python rather than
C—or to third party components, but at the cost of having
to mostly just live with whatever behavior other people have
decided upon.&lt;/p&gt;
&lt;p&gt;I started this post talking about Rust, which represents a new point
in the design space: programmer-managed memory but designed so it
prevents unsafe operations (unlike C and C++). In theory you might
think that this would remove cognitive burden from programmers because
your mistakes are less serious, but it also frontloads that burden
because it forces you to think about architectural issues upfront
rather than just having the program appear to work but fail in the
field (as with C and C++). I think comparing Go and Rust is
instructive here: they were designed at roughly the same time
but Go is vastly easier to learn than Rust, in large part due to
its memory model. On the other hand, while Go is clearly carving out
a serious niche, I think it&#39;s clear that Rust is the new language of
choice for hardcore systems programmers. This isn&#39;t only due to
memory model, of course, but the Rust memory model is part of a
general philosophy of programmer control and semantic transparency
in constrast to Go, which is designed specifically for ease of use.&lt;/p&gt;
&lt;p&gt;Despite the age of these ideas, as an industry, I don&#39;t think we&#39;re
done exploring the design space here. As an analogy, consider
high versus low-level languages. As I mentioned above,
higher level languages often have worse performance than lower
level languages, so it&#39;s not uncommon for engineers to write
big parts of their program in one language and then drop down
to a lower-level language for performance critical code: we see
this when C programs use inline assembly and R or Python
programs write modules in C or C++. One possibility is
to try something similar with memory management, namely to
have GC most of the time but then (safely!) escape back to
non-GCed mode for certain chunks of code,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;
though hopefully in a more idiomatic way than inline assembler.
I&#39;ve seen some systems that seem to be exploring this space
(e.g., the Boehm GC above, or Objective C&#39;s garbage collection),
but nothing in really wide use. Whatever the approach, I don&#39;t
think we&#39;re done here!&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It was especially bad on earlier versions of Firefox
because (1) much of the UI was written in JavaScript and (2) the
same JS VM was used for Web content and the browser UI.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Probably because this could
be on the stack &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Most likely not on the stack, at least
in a naive implementation. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I actually originally wrote this program in Python,
but I forgot that Python has a reference counting
GC and so the objects were being freed at the end of
every loop anyway. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In my code, I actually make a dummy free object
in this space to make the memory display
system, which also scans linearly, work properly. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This particular approach is due to Leslie Lamport. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://security.apple.com/blog/towards-the-next-generation-of-xnu-memory-safety/&quot;&gt;Some allocators&lt;/a&gt;,
for languages like C and C++ partition memory not just by size but by object type,
which helps prevent type confusion attacks due to use-after-free,
when an object of type &lt;code&gt;T&lt;/code&gt; is freed and reallocated as an
object of type &lt;code&gt;U&lt;/code&gt; but some dangling pointer of type &lt;code&gt;T*&lt;/code&gt; tries
to use it as an object of type &lt;code&gt;T&lt;/code&gt;. This isn&#39;t really an issue
for memory safe languages, though. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
&amp;quot;Region-based&amp;quot; means that it uses multiple chunks of memory rather than
one contiguous heap. &amp;quot;&lt;a href=&quot;https://en.wikipedia.org/wiki/Non-uniform_memory_access&quot;&gt;NUMA-aware&lt;/a&gt;&amp;quot;
means it understands the specific memory architecture of the machine
and will try to use faster (processor-local) memory rather than
slower memory. Colored pointers Load and store barriers refer to whether the
kind of barriers we described above happen when you read
pointers or write them. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I do want to emphasize here that Rust&#39;s design also gives you
thread safety for free, whereas Go&#39;s does not. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This isn&#39;t a guarantee because the program might, for
instance, crash. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For instance, suppose that the programmer allocates an array.
They might traverse the array by incrementing a pointer
relying on the count of elements to be able to recover the
original pointer to the beginning. In this case, there might not
be a pointer to the start of the array during the GC phase. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;t&#39;s technically
legal to have a pointer to just after the end of an array
to allow people to write for loops, but you can&#39;t dereference
it. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Rust
sort of does the opposite where it lets you jump
into unsafe mode, outside of Rust&#39;s guarantees. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-7/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding Memory Management, Part 6: Basic Garbage Collection</title>
		<link href="https://educatedguesswork.org/posts/memory-management-6/"/>
		<updated>2025-05-26T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/memory-management-6/</id>
		<content type="html">&lt;script src=&quot;https://unpkg.com/ohm-js@17/dist/ohm.min.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;module&quot;&gt;
import { InterpreterWidget } from &quot;https://gcexplorer.net/js/interpreterwidget.mjs&quot;;

document.addEventListener(
&quot;DOMContentLoaded&quot;,
(async () =&gt; {
async function makeAw(args) {
   const container = document.querySelector(`#${args.elementId}`);
   const figure = document.createElement(&quot;figure&quot;);
   container.appendChild(figure);
   const interiorId = `${args.elementId}--internal`;
   const div = document.createElement(&quot;div&quot;);
   div.setAttribute(&quot;id&quot;, interiorId);
   figure.appendChild(div);
   args.elementId = interiorId;
   if (args.caption) {
     const caption = document.createElement(&quot;figcaption&quot;);
     caption.textContent = args.caption;
     figure.appendChild(caption);
   }
   await InterpreterWidget(args);
   
}

// Fix for scroll issue. Thanks claude!
// Store current scroll position
 const scrollTop = window.pageYOffset;
 const scrollLeft = window.pageXOffset;
 
 // Temporarily prevent scrolling
 const originalOverflow = document.body.style.overflow;
 document.body.style.overflow = &#39;hidden&#39;;
 
 // Also prevent focus-related scrolling
 const originalScrollIntoView = Element.prototype.scrollIntoView;
 Element.prototype.scrollIntoView = function() {};
 
await makeAw({elementId : &quot;repl-demo&quot;, mode: &quot;justrepl&quot;, caption: &quot;A simple Memo REPL&quot;});

await makeAw({elementId : &quot;transcript-basic-layout&quot;, mode: &quot;transcript&quot;,
    programString:`a = (1 2 3)
a.0 = (3 4)
b = (5 6 7 (8 9))
c = ()`,
resetCurrent: false,
   caption: &quot;Memo memory layout&quot;,

});

await makeAw({elementId : &quot;transcript-refct&quot;, mode: &quot;transcript&quot;,
   allocatorType : &quot;refct&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-no-gc.memo&quot;,
   caption: &quot;Freeing memory when refct goes to zero&quot;
});

await makeAw({elementId : &quot;transcript-refct-circular&quot;, mode: &quot;transcript&quot;,
   allocatorType : &quot;refct&quot;,
   programString: `a = (1 (2 null))
a.1.1 = a
a = null`,
   caption: &quot;Circular references&quot;
});

await makeAw({elementId : &quot;transcript-marksweep&quot;, mode: &quot;transcript&quot;,
   allocatorType : &quot;marksweep&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   caption: &quot;Mark sweep&quot;
});

await makeAw({elementId : &quot;transcript-marksweep-pre-gc&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;marksweep&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 5,
   caption: &quot;Mark sweep marking phase&quot;
});

await makeAw({elementId : &quot;transcript-marksweep-in-gc1&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;marksweep&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 6,
   caption: &quot;Mark sweep sweeping phase&quot;
});

await makeAw({elementId : &quot;transcript-marksweep-in-gc2&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;marksweep&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 8,
   caption: &quot;Mark sweep complete. Note the holes&quot;
});

await makeAw({elementId : &quot;transcript-marksweep-post-gc1&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;marksweep&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 13,
});


/*await makeAw({elementId : &quot;transcript-marksweep-post-gc2&quot;, mode: &quot;static&quot;,
   allocatorType : &quot;marksweep&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc-allocate.memo&quot;,
   setToLine: 14,
   caption: &quot;A big hole&quot;
});*/

await makeAw({elementId : &quot;transcript-markcompact-pre-gc&quot;, mode: &quot;layout&quot;,
   allocatorType : &quot;markcompact&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 4,
   caption: &quot;Before GC with mark-compact&quot;
});

await makeAw({elementId : &quot;transcript-markcompact-post-gc&quot;, mode: &quot;layout&quot;,
   allocatorType : &quot;markcompact&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   resetCurrent : false,
   caption: &quot;After GC with mark-compact&quot;
});

await makeAw({elementId : &quot;transcript-markcompact-moved-ptr&quot;, mode: &quot;layout&quot;,
   allocatorType : &quot;markcompact&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   resetCurrent : false,
   caption: &quot;mark-compact with the moved value set&quot;,
   setToLine: 8
});

await makeAw({elementId : &quot;transcript-markcompact-gc&quot;, mode: &quot;transcript&quot;,
   allocatorType : &quot;markcompact&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 4,
   caption: &quot;mark-compact widget&quot;
});

await makeAw({elementId : &quot;transcript-copying-gc-1&quot;, mode: &quot;layout&quot;,
   allocatorType : &quot;copying&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 5,
   caption: &quot;Copying the roots&quot;
});

await makeAw({elementId : &quot;transcript-copying-gc-2&quot;, mode: &quot;layout&quot;,
   allocatorType : &quot;copying&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 6,
   caption: &quot;Patching up pointers&quot;
});


await makeAw({elementId : &quot;transcript-copying-gc&quot;, mode: &quot;transcript&quot;,
   allocatorType : &quot;copying&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 4,
   caption: &quot;Copying GC widget&quot;
});

await makeAw({elementId : &quot;transcript-copying-post-gc&quot;, mode: &quot;layout&quot;,
   allocatorType : &quot;copying&quot;,
   programUrl: &quot;/examples/memory-management-6/allocation-and-gc.memo&quot;,
   setToLine: 9,
   caption: &quot;After GC&quot;
});

// Re-enable scrolling and get back to the top.
document.body.style.overflow = originalOverflow;
Element.prototype.scrollIntoView = originalScrollIntoView;
       
window.scrollTo(scrollLeft, scrollTop);
})(),);

&lt;/script&gt;
&lt;link rel=&quot;stylesheet&quot; href=&quot;https://gcexplorer.net/css/interpreterwidget.css&quot; /&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/spiderman-pointing-garbage.jpg&quot; alt=&quot;Who&#39;s going to clean up this garbage (Spiderman)&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;This is the sixth post in my multipart series on memory
management. You will probably want to go back and read Part
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;I&lt;/a&gt;, which covers C, parts
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2&quot;&gt;II&lt;/a&gt; and
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3&quot;&gt;III&lt;/a&gt;, which cover C++, and parts
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4&quot;&gt;IV&lt;/a&gt; and
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5&quot;&gt;V&lt;/a&gt; which cover Rust.
C++ RAII and Rust
do a lot to simplify memory management but still
force you to constantly think about how you are using memory (this is
even more true with Rust). It&#39;s natural to ask why we have to do all
this work and why the computer can&#39;t just figure things out. The
answer is that it can—at a cost—which brings us to the
other major approach to handling memory: automatic memory
management/aka garbage collection.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
In this post, I want to introduce the basic ideas behind
garbage collection and describe the main algorithms which
provide the basis for modern GC systems.&lt;/p&gt;
&lt;h2 id=&quot;what-behavior-do-we-want%3F&quot;&gt;What behavior do we want? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#what-behavior-do-we-want%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In C and C++, the programmer is responsible for two major memory
management tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Allocating new memory on the heap when it&#39;s needed&lt;/li&gt;
&lt;li&gt;Returning unused memory back to the heap so it can be re-allocated
later.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In C, these operations are both completely manual using &lt;code&gt;malloc()&lt;/code&gt; and
&lt;code&gt;free()&lt;/code&gt; C++ provides manual memory management with &lt;code&gt;new&lt;/code&gt; and
&lt;code&gt;delete&lt;/code&gt;, but also provides various mechanisms to automatically
allocate and deallocate memory, such as container classes, RAII and smart pointers;
the programmer is still frequently responsible for explicitly
allocating objects and has to understand their lifetimes.
Similarly, Rust requires you to explicitly manage memory,
but protects you from errors when you fail to do so.&lt;/p&gt;
&lt;p&gt;In a wide variety of languages, ranging from Go to JavaScript, the
system handles all of these operations automatically completely. The
programmer just creates variables and objects and the language takes
care of allocating memory as required and de-allocates the memory at
an appropriate time. To see this in action, consider the following
trivial C function and its JavaScript equivalent:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;c&quot;&gt;C &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#c&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; a &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;br /&gt;  &lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; b_len &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  b&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;b_len&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;js&quot;&gt;JS &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#js&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; a &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; b &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;  b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Both versions of this code to the same thing. First, they create two
variables.&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;a&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;: An integer set to one&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;b&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;: A list consisting of the single integer 1&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;We then push another element onto &lt;code&gt;b&lt;/code&gt; to make the list &lt;code&gt;[1, 2]&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The first line of the function is basically the same in both languages.
Take a look at the second line, however, where we create &lt;code&gt;b&lt;/code&gt;. This
C code actually makes two memory-related decisions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It puts the memory on the stack (because we didn&#39;t call &lt;code&gt;malloc()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;It allocates a fixed-size array of size 2.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that even though we only add one value (&lt;code&gt;1&lt;/code&gt;) to &lt;code&gt;b&lt;/code&gt;, the array is
still of size 2. This is why we need a separate length field &lt;code&gt;b_len&lt;/code&gt;
to keep track of how many elements are actually in &lt;code&gt;b&lt;/code&gt;. The syntax
&lt;code&gt;{1}&lt;/code&gt; tells C to initialize the array with &lt;code&gt;1&lt;/code&gt; and then as many zeros
as are required to fill the rest of the array. Except for the
initialization, this should all be pretty familiar from part
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;I&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now compare the corresponding JS code, which looks fairly
similar, but that&#39;s just because JS imitates C syntax. Just
like in C, we make a local variable &lt;code&gt;b&lt;/code&gt; that can hold
a list of things, but we never explicitly tell JS whether it
should be on the stack or the heap or how big it should be.
So, what are the answers to those questions? Who knows? Who cares?
None of your business! The JS engine will do whatever it thinks best,
and may not even do the same thing every time. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The JS engine will automatically grow &lt;code&gt;b&lt;/code&gt; whenever you
add new elements. In this respect it&#39;s like the
C++ &lt;code&gt;vector&lt;/code&gt; container.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Local variables can be stored on either the stack or the
heap. In fact, they can be stored in one place and then
moved to the other when conditions change; it&#39;s totally
up to the JS engine.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whatever the JS engine does, it&#39;s essentially transparent to
the programmer, who just writes code and lets the language
worry about it. The same thing applies to many other languages
like JS, Python, Lisp, etc.&lt;/p&gt;
&lt;p&gt;In these languages, just as you don&#39;t have to worry about when you are
allocating memory, you also don&#39;t have to worry about de-allocating
it; the language automatically detects when you aren&#39;t using memory
and de-allocates it, in a process called &amp;quot;garbage collection&amp;quot; (commonly,
GC).
You just write code!&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;!-- TODO: Send a link to the code --&gt;
&lt;h2 id=&quot;memo%3A-a-tiny-language&quot;&gt;Memo: A Tiny Language &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#memo%3A-a-tiny-language&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In this post, we&#39;ll be taking a slightly different approach
than with previous posts. Because the internals of garbage
collection are (mostly) invisible and real-world garbage
collectors are very complicated, working with a real language
isn&#39;t that useful for explanatory purposes. Instead,
I&#39;ve designed and implemented a tiny language called
&lt;a href=&quot;https://gcexplorer.net/doc/memo/index.html&quot;&gt;Memo&lt;/a&gt; (for &amp;quot;memory demo&amp;quot;) that lets us look at the
impact of various garbage collection approaches without
being distracted byu a lot of language mechanics.&lt;/p&gt;
&lt;p&gt;Memo is deliberately not Turing complete—it has no
conditionals or loops—and only contains a small number of
operations for manipulating objects.&lt;/p&gt;
&lt;h3 id=&quot;data-types&quot;&gt;Data Types &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#data-types&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Memo comes with three native types. The first two of these
are familiar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Integers&lt;/strong&gt; representing bare numeric values.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pointers&lt;/strong&gt; to objects in memory (i.e., the address of those objects).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The only other data type in Memo is the &lt;em&gt;Tuple&lt;/em&gt;, which represents
an ordered set of values, each of which can be either an integer or a
a pointer. Tuples are wrapped in parentheses, as in
&lt;code&gt;(1 2 3)&lt;/code&gt;. Unlike Python lists or JS lists—but like Rust or Python tuples—you
can&#39;t extend tuples, so a tuple of length &lt;code&gt;X&lt;/code&gt; can&#39;t be turned
into a tuple of length &lt;code&gt;Y&lt;/code&gt;, though of course you can create
a new tuple of the right size.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This isn&#39;t a particularly rich type system, but it&#39;s actually
sufficient to construct a variety of data structures, as anyone who
has worked with Lisp, will recognize. For example you can make a list
of integers out of 2-valued tuple objects, with the first value of each
tuple being an integer and the second value being a pointer to the
next tuple.&lt;/p&gt;
&lt;h3 id=&quot;variable-naming-and-addressing&quot;&gt;Variable Naming and Addressing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#variable-naming-and-addressing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Like any language, Memo has named variables. All variables are
global and variables are automatically created upon assignment
without the need for &lt;code&gt;let&lt;/code&gt; or &lt;code&gt;var&lt;/code&gt; (simple, remember?).
It&#39;s not legal to read variables which haven&#39;t
been assigned to yet.&lt;/p&gt;
&lt;p&gt;Variables in Memo are named in the usual fashion, as strings
of letters and numbers starting with a letter, e.g., &lt;code&gt;tmp1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So, for instance, the following code:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;a = 20&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Creates a variable &lt;code&gt;a&lt;/code&gt; and assigns the value 20.&lt;/p&gt;
&lt;p&gt;You create a tuple in the way you expect:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;a = (1 2 3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code automatically allocates a tuple of size 3 on the heap and
assigns the address to &lt;code&gt;a&lt;/code&gt;. It&#39;s also legal to have tuples contain
other tuples, which really just means that one of the elements of the
tuple is the address of another tuple.&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;a = (1 2 3 (4 5))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Variables themselves aren&#39;t typed,
so it&#39;s perfectly legal to do:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;a = (1 2)      # a is a pointer to the tuple (1 2)&lt;br /&gt;a = 20         # a is the integer 20&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In order to make this work, internally, Memo keeps track of whether a
given value is a pointer or an integer (see &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#reading-objects-in-memory&quot;&gt;below&lt;/a&gt;).
This should be familiar if
you&#39;ve used other weakly typed languages like Python or JavaScript.
The inner values in a tuple are addressed with the notation &lt;code&gt;a.0&lt;/code&gt;,
&lt;code&gt;a.1&lt;/code&gt;, and so on.&lt;/p&gt;
&lt;p&gt;Variables never go out of scope, but you can assign them
to &lt;code&gt;null&lt;/code&gt;, which has most of the same effect, except that they&#39;re
still floating around in the namespace.&lt;/p&gt;
&lt;h3 id=&quot;demo&quot;&gt;Demo &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#demo&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The whole implementation of Memo is in JavaScript, so you can run
it directly in your browser (which is why I did it). I&#39;ve embedded
a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Read%E2%80%93eval%E2%80%93print_loop&amp;amp;oldid=1283396277&quot;&gt;&amp;quot;read-eval-print-loop&amp;quot; (REPL)&lt;/a&gt; window so you can play with it:&lt;/p&gt;
&lt;div id=&quot;repl-demo&quot;&gt;&lt;/div&gt;
&lt;p&gt;Note that all of the operations we&#39;ve looked at so far are &amp;quot;expressions&amp;quot;
which is to say that they return the result of the operation, which then
gets printed in the console (this is the &amp;quot;print&amp;quot; part of the REPL).
For instance,
if we set &lt;code&gt;a=20&lt;/code&gt; and then type &lt;code&gt;a&lt;/code&gt; we will get &lt;code&gt;Integer(20)&lt;/code&gt;. This is how
you get the value of an object, because there&#39;s no &lt;code&gt;print()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;It should be perfectly safe to type anything in this window: it doesn&#39;t
interact with anything in the rest of this post or on your computer.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
If you violate Memo&#39;s syntax, it will just complain to you and you
can enter a new instruction.&lt;/p&gt;
&lt;h2 id=&quot;the-allocator-interface&quot;&gt;The Allocator Interface &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#the-allocator-interface&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Next we need to look at how memory allocation works in Memo,
because we&#39;re going to need that to understand how the GC
system works.&lt;/p&gt;
&lt;h3 id=&quot;the-heap&quot;&gt;The Heap &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#the-heap&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Any memory allocator needs a pool of memory blocks (the heap) to
implement from. JS doesn&#39;t really allow for raw memory access, so
instead we are going to emulate the heap as a single big JS
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer&quot;&gt;ArrayBuffer&lt;/a&gt;—which is
just JS&#39;s way of conveniently handling an array of bytes—encapsulated
in a &lt;code&gt;Memory&lt;/code&gt; object. Internally, we create the heap like so:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; heap &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Memory&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Make a heap of size 10000 bytes&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Addresses in the heap are just integers in the range &lt;code&gt;[0, heapSize)&lt;/code&gt;,
so we can read and write with obvious-looking interfaces:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; byte &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; memory&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;readUInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;   &lt;span class=&quot;token comment&quot;&gt;// Read a byte from address 1.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; word &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; memory&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;writeUInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Read a 32-bit word from address 32.&lt;/span&gt;&lt;br /&gt;memory&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;writeUInt8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; byte &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Write byte + to address 1.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And so on. There aren&#39;t any rules about aligned access in
this implementation, so you
can read a 32-bit word from address 3 or whatever. Similarly,
you can read the pieces of a word byte by byte and then put it
back together into a word. Many real systems actually do
have alignment requirements.&lt;/p&gt;
&lt;h3 id=&quot;allocation&quot;&gt;Allocation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#allocation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Objects are allocated using the &lt;code&gt;Allocator&lt;/code&gt; class, which has
a simple interface. For instance, this code allocates
an object of type &lt;code&gt;Simple&lt;/code&gt; and returns its address (i.e.,
the index of the first byte of the object on the heap).&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;addr1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; allocator&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Simple&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As a practical matter, the only allocated type in Memo is the tuple, but
when I wrote this code originally I expected to have more than one
kind of type (e.g., structures with named members) and so
we have a more flexible system in which you can have objects
of arbitrary types, which is more than we need for Memo.
In Memo, however, each size tuple is its own type, automatically
named something like &lt;code&gt;(5)&lt;/code&gt; for a tuple of length 5.&lt;/p&gt;
&lt;p&gt;This is all hidden from the programmer, with the consequence
that when you write something like &lt;code&gt;a = (1 2 3)&lt;/code&gt; internally
the language runtime does:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;allocator&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;(3)&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and then assigns the internal values.&lt;/p&gt;
&lt;h3 id=&quot;object-layout&quot;&gt;Object Layout &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#object-layout&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The diagram below shows the situation after running a trivial program which
is shown below. This program allocates five tuples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The tuple &lt;code&gt;(1 2 3)&lt;/code&gt; which is assigned to the global variable &lt;code&gt;a&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The tuple &lt;code&gt;(3 4)&lt;/code&gt; which is assigned to the first element in &lt;code&gt;a&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The tuple &lt;code&gt;(8 9)&lt;/code&gt; which is at memory address 44 but is not
assigned to any global variable.&lt;/li&gt;
&lt;li&gt;The tuple &lt;code&gt;(5 6 7 Pointer(44))&lt;/code&gt; which points to the previous
tuple.&lt;/li&gt;
&lt;li&gt;The tuple &lt;code&gt;()&lt;/code&gt; which is assigned to the global variable &lt;code&gt;c&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can use the &amp;quot;Previous&amp;quot; and &amp;quot;Next&amp;quot; buttons to step through this
program line by line and see how the various tuples get made.
The highlighted line of code is the one that is about to run
(just like with a normal debugger) rather than the one that
has just run.&lt;/p&gt;
&lt;div id=&quot;transcript-basic-layout&quot;&gt;&lt;/div&gt;
&lt;p&gt;The global variables are shown in the top row. These wouldn&#39;t
typically be on the stack not the  heap but of course are somewhere in memory, so I&#39;m just
showing them here for convenience. Memory address 0
starts at the left of the gray box marked &amp;quot;Reserved&amp;quot;, so the first
allocatable address is &lt;code&gt;16&lt;/code&gt;. This is actually reasonably realistic
because the allocator typically needs to reserve some space for
its own bookkeeping operations (e.g., the last allocated block).
In my implementation, these values are just stored in JS variables
outside of the allocated memory region, and I decided to block
off the first 16 octets for more prosaic reasons: I wanted
the value of the null pointer (the one that doesn&#39;t point anywhere)
to be &lt;code&gt;0&lt;/code&gt;, and for that to work &lt;code&gt;0&lt;/code&gt; cannot be a valid address for
a real object. Note that there is actually nothing that requires
the null pointer to have memory representation of all &lt;code&gt;0&lt;/code&gt; bits, even
in C, but it&#39;s convenient and common.&lt;/p&gt;
&lt;p&gt;Each pointer is drawn with an arrow that shows the object it
points to. You&#39;ll notice that each object starts out with a
single 4 byte word which is where I store the type of
the object (in this case, just the number of elements in the
tuple). This word is also used for some other metadata as we&#39;ll
see later. The result is that even an empty tuple like &lt;code&gt;()&lt;/code&gt; consumes
a minimum of 4 bytes of storage. We&#39;ll see some other reasons
why we need this later. What these words currently show is
the address of the object (e.g., &lt;code&gt;@16&lt;/code&gt;) and the length of the
tuple in parentheses (e.g., &lt;code&gt;(3)&lt;/code&gt; for a tuple of length &lt;code&gt;3&lt;/code&gt;).
Note: this is the representation with a specific type of
garbage collector (mark-sweep). Other garbage collectors
will have slightly different layouts, as seen below.&lt;/p&gt;
&lt;p&gt;This is a very simple allocator, in which each object is allocated
directly after the end of the previous object, with the result
that all the objects are contiguous. This is what&#39;s called
a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Region-based_memory_management&amp;amp;oldid=1279599452&quot;&gt;&amp;quot;bump allocator&amp;quot;&lt;/a&gt; and is very easy to implement because we just need to
store one piece of extra data, the address of the next
object to be allocated, which is right after the end of the
last object that was allocated. When you allocate an object of
size &lt;code&gt;S&lt;/code&gt; you then just &amp;quot;bump&amp;quot; this value up by &lt;code&gt;S&lt;/code&gt;. Here is
the actual code for our bump allocator, which fits neatly
in your head.&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;bump_allocate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; size &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;heapSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;throw&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Out of memory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One interesting thing to note is that the tuple
&lt;code&gt;(8 9)&lt;/code&gt; appears in memory &lt;em&gt;before&lt;/em&gt; the tuple
&lt;code&gt;(5 6 7 Pointer(44))&lt;/code&gt; which points to it, despite
the tuples being introduced in the opposite order,
as in:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;b = (5 6 7 (8 9))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What&#39;s going on here is that Memo&#39;s interpreter works from the bottom
up, which means that it needs to first allocate the memory for &lt;code&gt;(8 9)&lt;/code&gt;
and then it can stuff the address in the other newly-created tuple. This
is a common design for this kind of simple parser.&lt;/p&gt;
&lt;h3 id=&quot;reading-objects-in-memory&quot;&gt;Reading Objects in Memory &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#reading-objects-in-memory&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Importantly, everything we have stored on the heap is self-describing,
which allows us to decode the contents of the heap without ancillary
storage, as long as we know the address of the first allocated
object in memory, in this case &lt;code&gt;16&lt;/code&gt;. This works
beacause each object has a common prefix in the first word, as
noted above:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt; 0 1 2 3 4 5 6 7 8 9 9 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 9 1 2 3 4 5 6 &lt;br /&gt;+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;|M|F| RESERVED  |             TypeId (24 bits)                      |&lt;br /&gt;+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we have a 32-bit (1 word on 32-bit processors) prefix which
starts with a flags byte. The first two bits in this byte are
assigned to a &lt;code&gt;M&lt;/code&gt; (for marked) and &lt;code&gt;F&lt;/code&gt; (for free) flag; we&#39;ll
cover these later. The rest are reserved for futrue use.&lt;/p&gt;
&lt;p&gt;The rest of the prefix contains the &lt;code&gt;TypeId&lt;/code&gt;, which is an identifier
for the type of the object. This identifier can be implemented in
a number of ways, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A pointer to a type description.&lt;/li&gt;
&lt;li&gt;An index into a table of type descriptions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In either case, the type description will store the number of
pointers in the object and their memory locations within the
object. Importantly, every instance of an object needs
to be laid out the same way, so that once you have the
object pointer and the type, you know everything else
about every element in the type.&lt;/p&gt;
&lt;p&gt;The object decoding process for an object at address &lt;code&gt;addr&lt;/code&gt; proceeds
as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Read the first 32-bit word and use that information to extract the
type ID, which, as above, is stored in the low order 24 bits.&lt;/li&gt;
&lt;li&gt;Look up the type using the type ID and from there determine
the number of elements in the object.&lt;/li&gt;
&lt;li&gt;Iterate over the elements and decode them. As noted above,
elements are always of type &lt;code&gt;pointer&lt;/code&gt; or type &lt;code&gt;integer&lt;/code&gt;,
but any given element can be either and element types can
change. In order to address this we steal a bit from the
top of each element to use for the type (this is often
called &amp;quot;coloring&amp;quot; the pointer). For pointers,
the high bit is &lt;code&gt;0&lt;/code&gt; and for integers, the bit for
&lt;code&gt;0x80000000&lt;/code&gt; is set.
This allows you to immediately see whether a given element
is a pointer or an integer, but at the cost that you
can only express values up to 2&lt;sup&gt;31&lt;/sup&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Because all elements are the same size, the type also tells us the
size of the object and so we can skip over to the next object, which,
as noted above, starts immediately after the current object (we&#39;ll get
to holes created by fragmentation later).&lt;/p&gt;
&lt;h4 id=&quot;aside%3A-dealing-with-binary-flags-in-js&quot;&gt;Aside: Dealing with binary flags in JS &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#aside%3A-dealing-with-binary-flags-in-js&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As an aside, it&#39;s a giant pain dealing with binary flags in JS because
there&#39;s really only one number type (float) and JS has decided that
&lt;code&gt;0x80000000&lt;/code&gt; is a positive number but &lt;code&gt;0x80000000&lt;/code&gt; is a negative
number. As a result you get this kind of thing.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; flag = 0x80000000
2147483648
&amp;gt; a = 3
3
&amp;gt; a | flag
-2147483645
&amp;gt; (a &amp;amp; flag) == flag
false
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;LOLWAT?&lt;/p&gt;
&lt;p&gt;I&#39;m not a JS wizard but according to Gemini the fix is to add a bunch
of &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Unsigned_right_shift&quot;&gt;0-sized unsigned right shifts&lt;/a&gt;,
like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; ((b &amp;amp; flag) &amp;gt;&amp;gt;&amp;gt; 0) &amp;amp; flag
-2147483648
&amp;gt; b = (a | flag) &amp;gt;&amp;gt;&amp;gt; 0
2147483651
&amp;gt; ((b &amp;amp; flag) &amp;gt;&amp;gt;&amp;gt; 0) == flag
true
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So when you see these scattered all over the code you know why.&lt;/p&gt;
&lt;h2 id=&quot;what-is-garbage%3F&quot;&gt;What is garbage? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#what-is-garbage%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;After that warmup, we&#39;re now ready to talk about garbage collection.
As already stated, there&#39;s no way to explicitly tell Memo that we&#39;re
not using a piece of memory, but we don&#39;t want to just have the amount
of memory we use grow monotonically, so we need some way for
Memo to reclaim that memory when it&#39;s no longer in use, hence
garbage collection.&lt;/p&gt;
&lt;p&gt;Conceptually, we have three kinds of memory:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Un-allocated (or freed/de-allocated) memory&lt;/li&gt;
&lt;li&gt;Memory which has been allocated and is in use&lt;/li&gt;
&lt;li&gt;Memory which is allocated but is not in use (&amp;quot;garbage&amp;quot;)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In C and C++, the allocator (&lt;code&gt;malloc()/free()&lt;/code&gt;) knows what memory
has been allocated and what has not but it does not know which
allocated memory is in use and which is not; it leaves that
responsibility to the programmer, who must free memory when
it is no longer in use. Automatic memory management requires
a mechanism to identify which allocated memory is garbage and
collect it.&lt;/p&gt;
&lt;p&gt;Defining &amp;quot;in use&amp;quot; is a somewhat tricky proposition: if I
allocate some memory at time &lt;em&gt;T&lt;/em&gt; and just keep it around until
the program ends, is it in use?&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
It&#39;s entirely possible that under other
circumstances I might have used it. For instance, suppose my
browser loads some math fonts and then I never go to a
page that renders math. But I might have, and obviously
both I and the programmer would be unhappy if the language
decided that the fonts would never be needed and just deallocated
them, with the program crashing when I went to a site that used math!&lt;/p&gt;
&lt;p&gt;Pretty much every system I am familiar with uses &lt;em&gt;unreachability&lt;/em&gt;
as the definition of garbage. Specifically, the assumption is that
there are a set of &amp;quot;root&amp;quot; pointers which aren&#39;t themselves on the
heap such as local variables (on the stack) or global variables.
A piece of memory is defined as in use if you can reach it by
following pointers from one of those root variables, e.g.,
&lt;em&gt;root → B → C → D&lt;/em&gt;. If it&#39;s not reachable from one
of the roots, then it&#39;s not in use. Because the language already
knows which data is allocated and which isn&#39;t, it can no
distinguish all three types of memory:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Un-allocated (or freed/de-allocated) memory&lt;/li&gt;
&lt;li&gt;In-use memory is allocated and reachable&lt;/li&gt;
&lt;li&gt;Garbage is memory that is allocated but unreachable&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that this excludes some data which is morally garbage in the
sense that the programmer knows they will never use it, but the
language has no way of knowing that. The reachability definition
defines garbage as memory which the language can prove the
program can&#39;t use because there&#39;s no way to reference it.
The figure below provides a simple example: allocations &lt;code&gt;A&lt;/code&gt; – &lt;code&gt;G&lt;/code&gt;
are all reachable by either &lt;code&gt;root1&lt;/code&gt; or &lt;code&gt;root2&lt;/code&gt;. Allocations &lt;code&gt;H&lt;/code&gt;–&lt;code&gt;K&lt;/code&gt;
are unreachable and are therefore garbage.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/garbage-graph.png&quot; alt=&quot;In-use and garbage memory&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Some in-use memory and some garbage
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Now that we understand what garbage is, let&#39;s take a look at how to collect it.&lt;/p&gt;
&lt;h2 id=&quot;reference-counting&quot;&gt;Reference Counting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#reference-counting&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted above, the simplest form of garbage collection is &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Reference_counting&amp;amp;oldid=1225073899&quot;&gt;reference
counting&lt;/a&gt;,
which we already saw in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3&quot;&gt;part
III&lt;/a&gt;. Reference counting works similarly
in a garbage collected language as it does in C++, except that all of
the machinery is hidden under the hood:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Every pointer is a reference counted pointer. This can be done with
an intrusive pointer style design because we&#39;re starting from
scratch and so can just insist that every object have an embedded
reference count.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s not possible to unbox pointers, so you never have to worry
about any aliasing issues such as a raw pointer and a shared
pointer pointing to the same object.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The language runtime automatically takes care of incrementing
and decrementing reference counts as appropriate, and freeing
objects when the reference count goes to zero.
After that, things just work without you having to think about it.&lt;/p&gt;
&lt;p&gt;The following widget shows reference counting in action with
a simple memo program. You can use the previous and next
buttons to step through the program one line at a time.&lt;/p&gt;
&lt;div id=&quot;transcript-refct&quot; &quot;=&quot;&quot;&gt;&lt;/div&gt;
&lt;p&gt;In the first three lines we build up a structure with four
tuples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The tuple &lt;code&gt;(P1 2 3)&lt;/code&gt; pointed to by &lt;code&gt;a&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The tuple &lt;code&gt;(4 5 6)&lt;/code&gt; pointed to by &lt;code&gt;a.0 (P1)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The tuple &lt;code&gt;(7 8 P2)&lt;/code&gt; pointed to by &lt;code&gt;b&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The tuple &lt;code&gt;(9 10 11&lt;/code&gt; pointed to by &lt;code&gt;b.2 (P2)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note
that in Memo it&#39;s not possible to create an object that isn&#39;t
pointed to by anything, so we can ignore that case. In line &lt;code&gt;4&lt;/code&gt;, we set &lt;code&gt;a&lt;/code&gt; to &lt;code&gt;null&lt;/code&gt;,
which turns both the object it points to into garbage
as well as the &lt;code&gt;(4 5 6)&lt;/code&gt; tuple that it points to. Once you
execute line &lt;code&gt;4&lt;/code&gt;, both objects will be freed, leaving only
the &lt;code&gt;(7 8 P2)&lt;/code&gt; tuple pointed to by &lt;code&gt;b&lt;/code&gt; and the &lt;code&gt;(9 10 11)&lt;/code&gt; that
&lt;code&gt;P2&lt;/code&gt; points to.&lt;/p&gt;
&lt;p&gt;Note that the memory layout is slightly different than in the
previous example in that we have an extra &lt;code&gt;RefCt&lt;/code&gt; field
between the type word and the elements. As suggested by the
name, this field stores the reference count value. We have
32-bits for the reference count, which means that we can have
up to 2&lt;sup&gt;32&lt;/sup&gt; references, which should be enough given
that we can only have 2&lt;sup&gt;31&lt;/sup&gt; objects (because
our pointers are 31 bits long). The result,
however, is that when we use reference counting, we consume
more memory per object than with some other kinds of garbage
collection. This kind of efficiency concern can be a big
deal in some systems, but not in a toy implementation like
ours.&lt;/p&gt;
&lt;p&gt;The key thing to notice is that what makes easy automatic
memory management straightforward is that the system doesn&#39;t give you
a choice. C and C++ were originally built with manual memory
management and so every attempt to add automatic memory management
has to contend with the old semantics. If you just build a language
with automatic memory management from the ground up, things are
a lot simpler.&lt;/p&gt;
&lt;h3 id=&quot;freed-memory&quot;&gt;Freed Memory &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#freed-memory&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I said above that when the reference count goes to zero, we
free an object does that mean in practice? We&#39;ve got one big contiguous memory
region, so it&#39;s not like we can return the memory associated with a
single object. Instead, freeing an object is a &lt;em&gt;bookkeeping&lt;/em&gt; operation
in which we note that that region is no longer in use. We do this
by setting the &lt;code&gt;F&lt;/code&gt; (for free bit) in the first word.
Of course, even though the object is not in use, we still
need to know how big it is. We address this by using the
rest of the first word to store the object size.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Because even unused memory regions aren&#39;t unstructured, we
can understand the entire memory layout just by starting
at the bottom of the heap and working forward one object
at a time until we get to the top. For instance, here is
a simple function which counts all the live objects:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ct&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; counter &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;scan &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; flags &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getFlags&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; len &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      scan &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; len&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;flags &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;FREE_BIT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        counter&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; counter&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The only extra information you need is the address of the
start of the heap (the start of the lowest allocated object) and the
end of the heap (the last allocated byte). From there, you can
just scan the entire heap, stopping when you get to the end.&lt;/p&gt;
&lt;!-- Reusing freed regions --&gt;
&lt;h3 id=&quot;circular-references&quot;&gt;Circular References &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#circular-references&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Unfortunately, reference counting has a number of disadvantages
that prevent most languages from using it as the sole form
of garbage collection. The most important of these is that,
as discussed in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3#circular-references&quot;&gt;part III&lt;/a&gt;
is that it deals badly with circular references, as shown in the
example below:&lt;/p&gt;
&lt;div id=&quot;transcript-refct-circular&quot;&gt;&lt;/div&gt;
&lt;p&gt;As expected, when we have two objects which point at each other,
neither will be freed even if we delete the reference from
global variable &lt;code&gt;a&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In part III we showed how to break reference cycles using &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#weak-pointers&quot;&gt;weak
pointers&lt;/a&gt;, but weak
pointers require that the programmer explicitly tag some references as
weak and some as strong, which undercuts the &amp;quot;it just works&amp;quot; value
proposition of the garbage collector.  Some languages do have support
for &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakRef&quot;&gt;weak
pointers&lt;/a&gt;,
but you really don&#39;t want the programmer to have to pay attention to
this every time they create a reference cycle, which happens all the
time.  Less important, but still relevant, is that there is
performance overhead from constantly having to increment and decrement
the reference count of objects whenever you pass them around.&lt;/p&gt;
&lt;p&gt;For these reasons, most garbage collected languages use another
form of garbage collection, either on its own or in combination
with reference counting. This nearly always means one of a broad
class of algorithms called &amp;quot;tracing garbage collection&amp;quot;.&lt;/p&gt;
&lt;h2 id=&quot;tracing-garbage-collection&quot;&gt;Tracing Garbage Collection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#tracing-garbage-collection&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic idea behind a tracing garbage collector is to start from the
root pointers and follow each pointer until you&#39;ve enumerated every
reachable object. Every other allocated object is garbage and can be
freed. This is a simple idea, but doing it well is hard.&lt;/p&gt;
&lt;h3 id=&quot;mark-sweep&quot;&gt;Mark-Sweep &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#mark-sweep&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Let&#39;s start with the most elementary&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
type of tracing garbage collector: mark-sweep.
A mark-sweep collector proceeds in two passes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Trace all the objects from the roots, recording which
objects are reachable.&lt;/li&gt;
&lt;li&gt;Scan over the entire heap, examining each object and
freeing those which were not recorded as reachable.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id=&quot;marking&quot;&gt;Marking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#marking&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The first problem is how to trace all the reachable
objects. Conceptually this is just a standard graph traversal
problem, where you want to start at the roots and touch
every node connected by an edge. You may be familiar
with algorithms for traversing trees, and this is a
similar problem with two additional complications:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;You don&#39;t just start from the single root of the
tree but from multiple roots, which may point
to some of the same nodes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This isn&#39;t necessarily an acyclic graph in the
you can have reference cycles; recall that this
is why we can&#39;t just use reference counting.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Nevertheless, this isn&#39;t particularly complicated.
Here&#39;s a simplified version of the marking algorithm
from our code (I&#39;ve removed some of the JavaScript
generator machinery that we use to step through
the GC one piece at a time.)&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mark_incremental&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;roots&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; root &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; roots&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;root&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; root &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL_POINTER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mark&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; root&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;root&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// Pop first, then process&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; current &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; num_ptrs &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getNumValues&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; current&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; num_ptrs&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; ptr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; current&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; ptr &lt;span class=&quot;token operator&quot;&gt;!==&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL_POINTER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; flags &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getFlags&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;flags &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;MARK_BIT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setFlags&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ptr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; flags &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;MARK_BIT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The basic logic is that every time we encounter a pointer
to a new object we add it to the &lt;code&gt;work_queue&lt;/code&gt;. Initially the
queue is populated by the pointers stored in the roots
(global variables), but then as we chase them we encounter
new pointers stored in objects on the heap, which are themselves
added to the work queue. We continue to pop objects off the work
queue until the work queue is empty, at which point the
marking process is done.&lt;/p&gt;
&lt;p&gt;This design needs some way to know which objects we have already seen
before. Otherwise if we have a loop where &lt;code&gt;A&lt;/code&gt; points to &lt;code&gt;B&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt;
points to &lt;code&gt;A&lt;/code&gt; we&#39;ll just go around that loop indefinitely. Unlike
the pointer/integer distinction, this
&amp;quot;seen&amp;quot; information cannot be stored in the pointers themselves because you might
have two pointers to the same object, and if you first reach
the object via pointer &lt;code&gt;A&lt;/code&gt; you want to know that it was marked
when you reach it again via pointer &lt;code&gt;B&lt;/code&gt;; instead, it has to be stored
along with the object, just as the reference count was. In this
case, we have plenty of space in the type word, so we use the
&lt;code&gt;M&lt;/code&gt; bit to store a &amp;quot;marked&amp;quot; value, which indicates
that the object has already been seen. When we encounter an object,
we only add it to the work queue if the &amp;quot;marked&amp;quot; bit is clear (i.e., 0)&lt;/p&gt;
&lt;h4 id=&quot;sweeping&quot;&gt;Sweeping &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#sweeping&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Once we&#39;ve completed the marking phase, we move on to the sweeping
phase. As described &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#freed-memory&quot;&gt;above&lt;/a&gt;, we can just move through memory from
the bottom of the heap one object
at a time, using the object type field to know the size of an object
and thus where one object ends and another begins.&lt;/p&gt;
&lt;p&gt;When we encounter a new object, we first check the free bit. If
that is set, the object is free and we move on to the next object.
This can happen if the object was freed in a previous pass.
We then check the mark bit.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the mark bit is set, we clear it so that it can be marked
in a future GC pass.&lt;/li&gt;
&lt;li&gt;If the mark bit is clear, we set the free bit.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We then move on to the next object, continuing until we get to
the end of the heap. Here&#39;s a slightly cleaned up version of the
JS code Memo uses for mark-sweep.&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;gc_incremental&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;roots&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Mark.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mark_incremental&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;roots&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;scan &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; flags &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getFlags&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; len &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// This is already free.&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;flags &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;FREE_BIT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        scan &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; len&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; nextscan &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; len&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// This is not marked, so free it.&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;flags &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;MARK_BIT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setFlags&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;FREE_BIT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_free_bytes &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; len&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unmark&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// Skip to the next entry.&lt;/span&gt;&lt;br /&gt;      scan &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; nextscan&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can use the following widget to step through the entire mark-sweep
process. This is the same code as we saw above with reference counting
with one small change which I&#39;ll get to shortly.  As before, you can
step through the code and watch how memory changes.&lt;/p&gt;
&lt;div id=&quot;transcript-marksweep&quot;&gt;&lt;/div&gt;
&lt;p&gt;The first thing to notice is that after line &lt;code&gt;4&lt;/code&gt; executes, we have
the same two pieces of garbage &lt;code&gt;(4 5 6)&lt;/code&gt; and &lt;code&gt;(Pointer(48) 2 3)&lt;/code&gt;,
but unlike with reference counting they haven&#39;t been freed, but
instead are just lurking around. This is because unlike reference
counting, tracing garbage collectors don&#39;t free memory as soon
as it becomes garbage; instead you have to explicitly run the
garbage collection algorithm. The objects are still unreachable,
so there&#39;s no way for them to be accessed, but they&#39;re still
there taking up space:&lt;/p&gt;
&lt;div id=&quot;transcript-marksweep-pre-gc&quot;&gt;&lt;/div&gt;
&lt;p&gt;In order to actually garbage collect these objects, we need to to run
the mark-sweep algorithm. Ordinarily this is something that the system
would do automatically, but to
make things simple I&#39;ve added a pseudo-instruction that invokes the
garbage collector in the form of &lt;code&gt;#gc&lt;/code&gt;. I say this is a
pseudo-instruction because from the perspective of Memo it&#39;s a
comment; instead the widget notices that you&#39;ve asked for GC and runs
the garbage collector externally.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Once we get to the GC instruction, you can then step through the
GC algorithm one step at a time. The orange &amp;quot;scan&amp;quot; pointer
shows which object we are examining now, and at appropriate
times the &amp;quot;work queue&amp;quot; will be shown. For instance, here
is the situation when the algorithm is examining the
object at &lt;code&gt;64&lt;/code&gt; and has just marked the object at &lt;code&gt;48&lt;/code&gt; (&lt;code&gt;(9 10 11)&lt;/code&gt;)
and added it to the work queue.&lt;/p&gt;
&lt;div id=&quot;transcript-marksweep-in-gc1&quot;&gt;&lt;/div&gt;
&lt;p&gt;Note that the marking phase doesn&#39;t proceed in any particular
order through memory, because it&#39;s just tracing out the graph
of object relationships. That&#39;s why we look at &lt;code&gt;64&lt;/code&gt; first
(because it&#39;s pointed to by &lt;code&gt;b&lt;/code&gt;) and then &lt;code&gt;48&lt;/code&gt; (because it&#39;s
pointed to by &lt;code&gt;64&lt;/code&gt;). By contrast, the sweeping phase proceeds
linearly through memory. Below, you can see partway through
the sweeping phase, after we have freed &lt;code&gt;16&lt;/code&gt; and right before
we free &lt;code&gt;32&lt;/code&gt;.&lt;/p&gt;
&lt;div id=&quot;transcript-marksweep-in-gc2&quot;&gt;&lt;/div&gt;
&lt;p&gt;At the end of the sweep process, all of the unreachable objects
will be freed, leaving only the reachable objects, just as
with reference counting,.&lt;/p&gt;
&lt;div id=&quot;transcript-marksweep-post-gc1&quot;&gt;&lt;/div&gt; 
&lt;h4 id=&quot;reclaiming-memory&quot;&gt;Reclaiming Memory &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#reclaiming-memory&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Now consider what happens if we do a new allocation as in:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;c = (12 13 14)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point, there are two things that can happen:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We can continue to bump allocate, and put the new object
above the last object in memory, in this case at
address &lt;code&gt;80&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We can reuse one of the regions we&#39;ve freed, in
this case probably at address &lt;code&gt;16&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There&#39;s a tradeoff here in that bump allocation is generally
faster—you just need to increment one pointer—but
eventually we have to start reusing free memory or there wasn&#39;t any
point in garbage collecting at all.  The basic challenge is
fragmentation: once the program has run for a while you end up with a
lot of &amp;quot;holes&amp;quot;, which is to say small free regions interspersed with
allocated regions, and you can have a situation where you have plenty
of total free memory but no region big enough for a new allocation. If
you reuse aggressively, this conserves the open region at the top of
the heap for big allocations, but at the cost of having to search for
a region that will fit each new allocation rather than just
incrementing the next pointer. Your allocation strategy needs
to try to compromise between these two.&lt;/p&gt;
&lt;p&gt;Memo&#39;s allocator uses a fairly simple compromise strategy where it
bump allocates up to the point where about:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The top of the heap is about halfway through the heap size.&lt;/li&gt;
&lt;li&gt;About half of the region that has been used is holes.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After that it tries to reuse free space; this means that
you get to use the fast allocator when there is no memory pressure but
you start trying to reuse before you still have plenty of room for
allocations that don&#39;t fit into any existing holes. Memo will
coalesce adjacent free regions as necessary, so multiple small
holes can be merged into a single bigger hold.&lt;/p&gt;
&lt;p&gt;There are a lot of fancier strategies one can use, especially for
figuring out which hole to put new allocations into. For instance,
you can have a table of the free regions of a given size or
&amp;quot;bucket&amp;quot; allocations into a small number of sizes so that it&#39;s
easier to find an appropriate location (at the cost of wasting
space when an allocation is just over the size of one bucket
and a lot smaller than the next biggest bucket). None of these
eliminate fragmentation, but they can reduce it. Whatever strategy you
use, this is just something you have to deal with with mark-sweep
or reference counting.&lt;/p&gt;
&lt;p&gt;The reason we have fragmentation is that we don&#39;t get to choose which
objects will be freed. Suppose we have three objects at &lt;code&gt;16&lt;/code&gt;, &lt;code&gt;32&lt;/code&gt;, and &lt;code&gt;48&lt;/code&gt;
of size &lt;code&gt;16&lt;/code&gt;. If we then free the objects at &lt;code&gt;16&lt;/code&gt; and &lt;code&gt;48&lt;/code&gt;, we now have
&lt;code&gt;32&lt;/code&gt; bytes worth of free memory, but we can&#39;t allocate a &lt;code&gt;32&lt;/code&gt; byte
object because that memory is discontinuous; instead we need to use
the bump allocator. Eventually this process results in lots of
fragmentation. But what if we could instead slide the object at &lt;code&gt;32&lt;/code&gt;
over to &lt;code&gt;16&lt;/code&gt;, leaving the whole &lt;code&gt;32--64&lt;/code&gt; region free? This isn&#39;t
possible in C-like languages where the pointers are directly
exposed to the programmer, but if you don&#39;t let the programmer
look at pointers, you have a lot more freedom to operate.&lt;/p&gt;
&lt;div id=&quot;transcript-marksweep-post-gc2&quot;&gt;&lt;/div&gt; 
&lt;h3 id=&quot;mark-compact&quot;&gt;Mark-Compact &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#mark-compact&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The next garbage collector we&#39;ll be looking at is what&#39;s called
mark-compact. As the name suggests, a mark-compact collector
starts with a marking phase just like mark-sweep, but after
the sweep phase is complete, instead of just leaving holes
it slides every object as far towards the left (low memory)
as possible, eliminating all the holes.&lt;/p&gt;
&lt;p&gt;The following two diagrams show the situation before and
after the GC pass.&lt;/p&gt;
&lt;div id=&quot;transcript-markcompact-pre-gc&quot;&gt;&lt;/div&gt; 
&lt;div id=&quot;transcript-markcompact-post-gc&quot;&gt;&lt;/div&gt; 
&lt;p&gt;As you can see, the tuples &lt;code&gt;(7 8 Pointer)&lt;/code&gt; and &lt;code&gt;(9 10 11)&lt;/code&gt; have moved
from their original positions at &lt;code&gt;56&lt;/code&gt; and &lt;code&gt;76&lt;/code&gt; to &lt;code&gt;16&lt;/code&gt; and &lt;code&gt;36&lt;/code&gt;
respectively. As a result, all the allocated memory is now contiguous
and so you can just bump allocate all the time.&lt;/p&gt;
&lt;p&gt;This seems great because allocation is now super fast, but the cost is
complexity in the GC phase. Specifically, we need to rewrite every
pointer—or at least every pointer which points to an object we
are keeping—to point to the location where the object will
eventually end up. There are a number of algorithms for making
this work, but Memo uses the relatively simple &amp;quot;Lisp 2&amp;quot; algorithm,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
which works in three passes.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Scan through memory one object at a time computing the eventual
location of each live object. This effectively simulates bump
allocation because each live object will just be right after
the previous one. This information is stored in a new &amp;quot;Moved&amp;quot;
field in each object, which is now one word larger than
with mark-sweep (but the same size as in reference counting,
in our implementation). Note that we need a separate word
here because the object has to remain intact until we copy it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scan through memory one object at a time. For each pointer
in each object, go to the object it points to and find the
&amp;quot;Moved&amp;quot; pointer and rewrite the pointer with the &amp;quot;Moved&amp;quot;
value (see the diagram below):&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scan through memory one object at a time, copying each live
object to its new location.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The figure below shows an early part of phase 1, in which the moved
pointer for the object at &lt;code&gt;56&lt;/code&gt; has been set to its new location at
&lt;code&gt;16&lt;/code&gt;.&lt;/p&gt;
&lt;div id=&quot;transcript-markcompact-moved-ptr&quot;&gt;&lt;/div&gt;   
&lt;p&gt;A simplified version of Memo&#39;s code for this is below.&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;gc_incremental&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;roots&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// First mark.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mark_incremental&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;roots&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; free_ptr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Step 1. Set the future addresses for each object&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// we are retaining.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isMarked&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setXword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; free_ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        free_ptr &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      scan &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Step 2. Update references for each marked object.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isMarked&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; num_ptrs &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getNumValues&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// Iterate over all the pointers and update the value to whatever&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// is in the moved field in the pointed at value.&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; num_ptrs&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; ptr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; ptr &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL_POINTER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token keyword&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; new_ptr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getXword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;          ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setValue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; new_ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      scan &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Update the roots.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; new_roots &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; root &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; roots&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      new_roots&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getXword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; root&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Step 3. Move all objects into their expected locations.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; end &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; scan &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isMarked&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; target &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getXword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        Memory&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;memmove&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; target&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; scan&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// Unmark the new copy so it can be GCed later.&lt;/span&gt;&lt;br /&gt;        ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setXword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; target&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL_POINTER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unmark&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; target&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        end &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      scan &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; end&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; new_roots&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The widget below will let you watch the mark-compact process in
action.&lt;/p&gt;
&lt;div id=&quot;transcript-markcompact-gc&quot;&gt;&lt;/div&gt; 
&lt;p&gt;Mark-compact collectors minimize fragmentation but at a modest cost
in terms of memory overhead (due to the &amp;quot;moved&amp;quot; field) and a
performance cost in terms of multiple passes over memory. There
are fancier mark-compact two pass algorithms (one mark pass, one compaction pass)
that use ancillary storage for the forwarding addresses (see § 3.4 of the Garbage Collection
Handbook for one such example). If you&#39;re willing to really
go wild with memory consumption, however, you can have
an even simpler GC phase.
This is the idea behind a copying (also called &amp;quot;semispace&amp;quot;) collector.&lt;/p&gt;
&lt;h3 id=&quot;copying-garbage-collectors&quot;&gt;Copying Garbage Collectors &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#copying-garbage-collectors&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The idea behind a copying collector is that you have two heaps, A and
B.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
Each of these heaps is about the same size as your normal heap, so this
consumes twice as much memory.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
You initially allocate your objects in A using a standard bump
allocator, and then when it is time to perform garbage collection you
copy all the live objects into B and abandon anything left in A.
Because B is compact, you can continue to use a bump allocator, and
then when you GC, you copy from B into A, and so on. This can
all be done in a single pass because you&#39;re using the source
heap as temporary storage while you copy into the destination heap.&lt;/p&gt;
&lt;p&gt;Memo&#39;s copying algorithm is shown below, but it&#39;s helpful to walk through it.&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;process_ptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;address&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isMarked&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#from_heap&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// The first word is overloaded for the forwarding address.&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;        ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;readHeaderWord&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#from_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;MARK_BIT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Not moved yet.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getSize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#from_heap&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; new_address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    Memory&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;memmove&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; new_address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#from_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_end &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Now overwrite the first word to point to the new location.&lt;/span&gt;&lt;br /&gt;    ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;writeHeaderWord&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#from_heap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; new_address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mark&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#from_heap&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#movedList&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;address&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Moved&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; new_address&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;new_address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;gc_incremental&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;roots&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#movedList &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#inGc &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;flip&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; new_roots &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// First process the roots.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; root &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; roots&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;root&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; root &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL_POINTER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; todo&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process_ptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;root&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      new_roots&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;todo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Now trace through all objects, copying as we go.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; current &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;pop&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; num_ptrs &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getNumValues&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; current&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; num_ptrs&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; pointer &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; current&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isPointer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;pointer&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; pointer &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL_POINTER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;address&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; todo&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process_ptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;pointer&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;todo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_work_queue&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;        ObjectManager&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;setValue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; current&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;#inGc &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; new_roots&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Like mark-sweep and mark-compact, a copying GC works by tracing
objects from the roots. Every time you encounter a pointer &lt;code&gt;p&lt;/code&gt; for
the first time you do the following (this is mostly in
&lt;code&gt;process_ptr()&lt;/code&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Copy the object into the destination address space (&amp;quot;to-space&amp;quot;) Call the
new address &lt;code&gt;n&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overwrite the first word (at &lt;code&gt;p&lt;/code&gt;) with the new address (&lt;code&gt;n&lt;/code&gt;).
This makes the object invalid, because valid
objects have the type in the low-order three bytes of
the first word, but this is safe because you have already copied the object so you
can use the original (in &amp;quot;from-space&amp;quot;) as scratch space.
We don&#39;t need a separate word in each object like we do for
mark-compact.&lt;/li&gt;
&lt;li&gt;Set the mark bit in the first word (at &lt;code&gt;p&lt;/code&gt;) so you can tell you have
seen it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;Add the object to the work queue.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Every time you pull an object off the work queue (by definition, all
these objects are already copied) you look at each pointer &lt;code&gt;p&lt;/code&gt;. If the
pointer &lt;code&gt;p&lt;/code&gt; is new, you copy the object (as above) and remember
&lt;code&gt;n&lt;/code&gt;. Otherwise, you just look at the type field to get &lt;code&gt;n&lt;/code&gt;. You then
overwrite the pointer with &lt;code&gt;n&lt;/code&gt;, leaving this object with correct
pointers. Once you have finished with the work queue, you have
(1) copied every live object and (2) updated all their pointers
and you are done. You can then abandon the source heap, which
will become the destination heap the next time around.&lt;/p&gt;
&lt;p&gt;This is a simple algorithm but can be a bit confusing, so it&#39;s
helpful to go through this step by step.&lt;/p&gt;
&lt;div id=&quot;transcript-copying-gc-1&quot;&gt;&lt;/div&gt; 
&lt;p&gt;The above figure shows the result of the first GC step, where we
have processed the object pointed at by the first root, which
was at address &lt;code&gt;64&lt;/code&gt;. As this was the first object processed
(the only one pointed to by the root) it got copied to the
lowest address in the other half the heap (to-space).
Note that it was copied &lt;em&gt;as-is&lt;/em&gt;, which means that it&#39;s
internal pointers all still refer to some object that
has not been copied yet (i.e., they point to from-space). This
will have to be patched up later, which is why this object
had to be added to the work queue. We used the original
copy of the object (in from-space) to store a tombstone
indicating where the object was moved to. The rest of the
object has the original contents, but those will never
be examined and could in principle be invalid.&lt;/p&gt;
&lt;p&gt;Now that we&#39;ve exhausted the roots, we move to process the
work queue, which means processing the object at &lt;code&gt;to:16&lt;/code&gt;.
We iterate through all the pointers in that object, copying
the objects into to-space (again, as-is), and then patch
up the pointer in &lt;code&gt;to:16&lt;/code&gt; to match the new location, as
shown below:&lt;/p&gt;
&lt;div id=&quot;transcript-copying-gc-2&quot;&gt;&lt;/div&gt;
&lt;p&gt;Now, the work queue has the object we just copied, which
is stored at &lt;code&gt;to:32&lt;/code&gt;. Next we scan through that object looking
through pointers, but there aren&#39;t any, so once we&#39;ve
completed that the GC process will be complete, and we
can just abandon from-space and all its objects.&lt;/p&gt;
&lt;p&gt;The widget below will let you walk through this all one
step at a time if you want.&lt;/p&gt;
&lt;div id=&quot;transcript-copying-gc&quot;&gt;&lt;/div&gt; 
&lt;p&gt;One thing to notice here is that unlike mark-compact, which just
slides all the allocations to the left, a copying GC does not
necessarily preserve the relative order of allocations on the
heap. For example, in the line&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;b = (7 8 (9 10 11))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This allocates the internal tuple first (at lower memory) and
the external tuple second (at higher memory). However, when we
trace from the roots, we encounter the external tuple first
and so it gets copied first, ending up at lower memory, as seen
below.&lt;/p&gt;
&lt;div id=&quot;transcript-copying-post-gc&quot;&gt;&lt;/div&gt;
&lt;p&gt;This won&#39;t have a correctness impact, but may have a performance
impact depending on the original layout and memory access patterns.
Note that on a second copy, this order will be preserved, because
we access the outer tuple first.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-advanced-garbage-collection&quot;&gt;Next Up: Advanced Garbage Collection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-6/#next-up%3A-advanced-garbage-collection&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The algorithms described in this post are the foundation of basically
every modern GC, but I&#39;ve only described them in their simplest form.
In the next post, I&#39;ll be covering some of the complexities in making
GC deployable, especially for a high performance interactive system
(e.g., a Web browser).  Importantly, all of these complexities are
(nearly) completely hidden from the programmer, because they mostly
have performance impacts in terms of when and how fast the GC runs.
This allows the language implementor to improve the GC in their
runtime without the language user having to do anything to get the
benefits of the new implementation.  This isn&#39;t to say that they
aren&#39;t important, however: GC can have a huge impact on the
performance of a system.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There&#39;s also a minor approach where you can&#39;t do any memory
allocation, like in old school FORTRAN, but we can ignore that. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Mostly. Note that this doesn&#39;t mean you don&#39;t need to think
about whether you are doing deep or shallow copies because
they have different programming semantics. You don&#39;t
have to worry about whether there is memory being
allocated, though. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Some languages have nice idioms for this, like the JS
&lt;code&gt;...&lt;/code&gt; spread operator, but Memo is deliberately minimalist,
and so there&#39;s not even a way to do this generically
without knowing the length. However you can use Lisp-style
lists. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Don&#39;t trust me, trust the &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/&quot;&gt;Web security model&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Occasionally you&#39;ll see someone propose a system for avoiding
memory leaks that comes down to just keeping a pointer to all
allocated memory, with the result that that data is morally
leaked but not formally leaked. I can never tell if these
people are serious. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I went back and forth on whether to just keep the type
field, which also carries the size, but eventually
decided it was better to store the size. The reason for
this is slightly subtle: when we get to mark-compact
later, it is possible to temporarily have holes which
are smaller than any valid object (because the smallest
object will be two words and the hole can be one word).
This isn&#39;t an issue for the garbage collector itself,
which doesn&#39;t need to skip over them, but it messes
up the code I&#39;m using to draw the heap. Storing
the length in the first word always works and doesn&#39;t
have this problem. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Or, as I once heard &lt;a href=&quot;https://www.precedia.com/FrankJackson.html&quot;&gt;Frank Jackson&lt;/a&gt;,
who had worked extensively on the Smalltalk 80 garbage collector
call it &amp;quot;the second lamest form of garbage collection&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that it&#39;s not possible for the mark bit to be set on a
freed object because otherwise it wouldn&#39;t have been freed. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I could have added a GC instruction to Memo, but I didn&#39;t
for two reasons. First, this would be unusual because GC
is usually automatic. Second, I wanted to let you step
through the GC one operation at a time and that wouldn&#39;t
work if the instruction were processed by the
Memo interpreter directly. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See § 3.2 of the &lt;a href=&quot;https://gchandbook.org/&quot;&gt;Garbage Collection Handbook&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In our implementation, we just have two heaps that share
the same address space starting at 0, and internally I
keep track of which heap is in use. This is fine because
the addresses are just indexes into a table. In a system
which was closer to the metal, you might instead tag
the addresses using the higher order bits, as we have been
doing so far. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The other way of looking at it is that you have one heap
which is split into two &amp;quot;semispaces&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is all a bit fiddly because in other contexts the same bit
(0x80000000) means that the field is an integer rather than
a pointer, but in this case we know it&#39;s a pointer so we can overload
the meaning. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-6/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding Memory Management, Part 5: Fighting with Rust</title>
		<link href="https://educatedguesswork.org/posts/memory-management-5/"/>
		<updated>2025-04-20T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/memory-management-5/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/lifetime-annotations.jpg&quot; alt=&quot;Lifetime annotations everywhere&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;This is the fifth post in my planned multipart series on memory
management. You will probably want to go back and read Part
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;I&lt;/a&gt;, which covers C, parts
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2&quot;&gt;II&lt;/a&gt; and
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3&quot;&gt;III&lt;/a&gt;, which cover C++, and part
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4&quot;&gt;IV&lt;/a&gt;, which introduces Rust memory
management.  In part IV, we got through the basics of Rust memory
management up through smart pointers. In this post I want
to look at some of the gymnastics you need to engage in to do
serious work in Rust.&lt;/p&gt;
&lt;h2 id=&quot;unexpected-moves&quot;&gt;Unexpected Moves &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#unexpected-moves&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Consider the following simple Rust code:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token macro property&quot;&gt;vec!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; y &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; x &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; x&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is straightforward: we create a vector containing the values
&lt;code&gt;[1, 2]&lt;/code&gt;, then iterate over it and print each element, and then
finally print out the length of the vector. This is the kind of
code people write every day. Let&#39;s see what happens when we compile
it.&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Error[E0382]: borrow of moved value: `x`&lt;br /&gt;   --&gt; iter_into.rs:7:20&lt;br /&gt;    |&lt;br /&gt;2   |     let x = vec![1, 2];&lt;br /&gt;    |         - move occurs because `x` has type `Vec&lt;i32&gt;`, which does not implement the `Copy` trait&lt;br /&gt;3   |&lt;br /&gt;4   |     for y in x {&lt;br /&gt;    |              - `x` moved due to this implicit call to `.into_iter()`&lt;br /&gt;...&lt;br /&gt;7   |     println!(&quot;{}&quot;, x.len());&lt;br /&gt;    |                    ^ value borrowed here after move&lt;br /&gt;    |&lt;br /&gt;note: `into_iter` takes ownership of the receiver `self`, which moves `x`&lt;br /&gt;   --&gt; /Users/ekr/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/traits/collect.rs:346:18&lt;br /&gt;    |&lt;br /&gt;346 |     fn into_iter(self) -&gt; Self::IntoIter;&lt;br /&gt;    |                  ^^^^&lt;br /&gt;help: consider iterating over a slice of the `Vec&lt;i32&gt;`&#39;s content to avoid moving into the `for` loop&lt;br /&gt;    |&lt;br /&gt;4   |     for y in &amp;x {&lt;br /&gt;    |              +&lt;br /&gt;&lt;br /&gt;error: aborting due to 1 previous error&lt;br /&gt;&lt;br /&gt;For more information about this error, try `rustc --explain E0382`.&lt;br /&gt;make: *** [iter_into.out] Error 1&lt;br /&gt;&lt;/i32&gt;&lt;/i32&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;No joy! The error message is reasonably helpful, though:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;note: `into_iter` takes ownership of the receiver `self`, which moves `x`&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At a high level, here is what is happening.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;for y in x&lt;/code&gt; syntax tells Rust you want an iterator&lt;/li&gt;
&lt;li&gt;In order to produce that iterator, Rust calls &lt;code&gt;x.into_iter()&lt;/code&gt;, defined
by the trait &lt;a href=&quot;https://doc.rust-lang.org/nightly/core/iter/trait.IntoIterator.html&quot;&gt;IntoIterator&lt;/a&gt;
which results in an iterator over &lt;code&gt;i32&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The iterator &lt;em&gt;takes ownership&lt;/em&gt; of the input vector &lt;code&gt;x&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the spirit of this series, though, let&#39;s dig one level deeper. You can ignore
the rest of this section if you don&#39;t really care about Rust details, but this
took me a little while to work out, so it&#39;s going up on the Internet.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;for y in x&lt;/code&gt; expression in Rust is &lt;a href=&quot;https://doc.rust-lang.org/stable/reference/expressions/loop-expr.html#iterator-loops&quot;&gt;syntactic
sugar&lt;/a&gt;
for creating an iterator. The &lt;code&gt;x&lt;/code&gt; value must implement the
&lt;a href=&quot;https://doc.rust-lang.org/nightly/core/iter/trait.IntoIterator.html&quot;&gt;&lt;code&gt;IntoIterator&lt;/code&gt;&lt;/a&gt;
trait, which has the method &lt;code&gt;into_iter()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;trait&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;IntoIterator&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Item&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;IntoIter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Iterator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Item&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;Self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Item&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Required method&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;into_iter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;Self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;IntoIter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;.into_iter()&lt;/code&gt; returns an
&lt;a href=&quot;https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html&quot;&gt;&lt;code&gt;Iterator&lt;/code&gt;&lt;/a&gt;
object which exposes a &lt;code&gt;.next()&lt;/code&gt; method that returns the next value
in the iterator. You can loop over the iterator by calling &lt;code&gt;.next()&lt;/code&gt;
until it returns &lt;code&gt;None&lt;/code&gt; (see
&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#union-types&quot;&gt;here&lt;/a&gt;
for some background on union types in Rust). For reference, here&#39;s what
the Rust reference says is the equivalent code to &lt;code&gt;for ...&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;IntoIterator&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;into_iter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;iter_expr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; iter &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;label&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;loop&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; next&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token keyword&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Iterator&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; iter&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token class-name&quot;&gt;Option&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Some&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; next &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; val&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token class-name&quot;&gt;Option&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;None&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;PATTERN&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; next&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;/* loop body */&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    result&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As the code above says, Rust implicitly calls &lt;code&gt;IntoIterator::into_iter(x)&lt;/code&gt;,
which is to say it calls the &lt;code&gt;.into_iter()&lt;/code&gt; method call for &lt;code&gt;x&lt;/code&gt; (in this
case of type &lt;code&gt;Vec&amp;lt;i32&amp;gt;&lt;/code&gt;, i.e., &lt;code&gt;x.into_iter()&lt;/code&gt;). The &lt;code&gt;IntoIterator::into_iter()&lt;/code&gt; syntax is
needed in case &lt;code&gt;x&lt;/code&gt; implements more than one trait that has an &lt;code&gt;into_iter()&lt;/code&gt;
method because we need to tell the Rust compiler which method to choose
(see below).&lt;/p&gt;
&lt;p&gt;This syntactic sugar is all internal compiler magic, but from here on in the rest is normal
(though a bit arcane) Rust.&lt;/p&gt;
&lt;h3 id=&quot;function-overloads&quot;&gt;Function Overloads &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#function-overloads&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;So why does this result in a move and why does replacing &lt;code&gt;x&lt;/code&gt; with &lt;code&gt;&amp;amp;x&lt;/code&gt; fix it?
If you&#39;ve done any Rust programming, you know that you can call a method that
takes any kind of &lt;code&gt;self&lt;/code&gt; parameter (i.e., a moved object, a reference, or a mutable
reference) as &lt;code&gt;self.foo&lt;/code&gt; and Rust will automatically produce the right kind of parameter
assuming your object is compatible in terms of mutability.&lt;/p&gt;
&lt;p&gt;For instance:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;code&quot;&gt;Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#code&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;X&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;X&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;ref_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;ref&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;mut_ref_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;mut_ref&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;move_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;move&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;X&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; y &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;X&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; yref &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; y&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    x&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ref_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    x&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mut_ref_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    x&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;move_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// x.move_method();     Does not compile&lt;/span&gt;&lt;br /&gt;    yref&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ref_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    yref&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mut_ref_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// yref.move_method();  Does not compile&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;y&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ref_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; y&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;mut_ref_method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;output&quot;&gt;Output &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#output&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;ref&lt;br /&gt;mut_ref&lt;br /&gt;move&lt;br /&gt;ref&lt;br /&gt;mut_ref&lt;br /&gt;ref&lt;br /&gt;mut_ref&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;x&lt;/code&gt; is actually a mutable object, but as you can see, when we call the
&lt;code&gt;ref_method&lt;/code&gt;, it gets an immutable reference and the &lt;code&gt;mut_ref_method&lt;/code&gt;
it gets an immutable reference, so the compiler just handles this.
Note that if we try to call &lt;code&gt;x.move_method()&lt;/code&gt;
twice, we get an error about the use of a moved value, just as we expect
(that&#39;s why I called this method last).&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;error[E0382]: use of moved value: `x`&lt;br /&gt;  --&gt; receiver1.rs:23:5&lt;br /&gt;   |&lt;br /&gt;18 |     let mut x = X {};&lt;br /&gt;   |         ----- move occurs because `x` has type `X`, which does not implement the `Copy` trait&lt;br /&gt;...&lt;br /&gt;22 |     x.move_method();&lt;br /&gt;   |       ------------- `x` moved due to this method call&lt;br /&gt;23 |     x.move_method()&lt;br /&gt;   |     ^ value used here after move&lt;br /&gt;   |&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Similarly I can create a new &lt;code&gt;X&lt;/code&gt; named &lt;code&gt;y&lt;/code&gt; and a reference to it called &lt;code&gt;yref&lt;/code&gt; and
call most of the methods via it. Note that you can&#39;t call &lt;code&gt;move_method()&lt;/code&gt; because
you&#39;re not allowed to move things via references that way:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;error[E0507]: cannot move out of `*yref` which is behind a mutable reference&lt;br /&gt;  --&gt; receiver1.rs:27:5&lt;br /&gt;   |&lt;br /&gt;27 |     yref.move_method();&lt;br /&gt;   |     ^^^^ ------------- `*yref` moved due to this method call&lt;br /&gt;   |     |&lt;br /&gt;   |     move occurs because `*yref` has type `X`, which does not implement the `Copy` trait&lt;br /&gt;   |&lt;br /&gt;note: `X::move_method` takes ownership of the receiver `self`, which moves `*yref`&lt;br /&gt;  --&gt; receiver1.rs:12:20&lt;br /&gt;   |&lt;br /&gt;12 |     fn move_method(self) {&lt;br /&gt;   |                    ^^^^&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I can even do the same thing without the temporary and just say &lt;code&gt;(&amp;amp;y).ref_method()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So, if &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;&amp;amp;x&lt;/code&gt; are (mostly) interchangeable in method calls,
why do we have a problem and why does &lt;code&gt;&amp;amp;x&lt;/code&gt; fix it. The answer lies
in the fact that we&#39;re not calling a normal method but rather an
implementation of a trait (in this case &lt;code&gt;IntoIterator&lt;/code&gt;). Because
traits are disconnected, it&#39;s possible for two traits to have
the same methods, like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size_cm&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;f64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;trait&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Metric&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Metric&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Metric: {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size_cm&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;trait&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Imperial&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Imperial&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Imperial: {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size_cm &lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Both &lt;code&gt;Metric&lt;/code&gt; and &lt;code&gt;Imperial&lt;/code&gt; have &lt;code&gt;size()&lt;/code&gt; functions, so if we make
a &lt;code&gt;Hat&lt;/code&gt; and call &lt;code&gt;.size()&lt;/code&gt;, what will happen? The answer is a compilation
error:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;error[E0034]: multiple applicable items in scope&lt;br /&gt;  --&gt; trait-overload.rs:28:7&lt;br /&gt;   |&lt;br /&gt;28 |     h.size();&lt;br /&gt;   |       ^^^^ multiple `size` found&lt;br /&gt;   |&lt;br /&gt;note: candidate #1 is defined in an impl of the trait `Imperial` for the type `Hat`&lt;br /&gt;  --&gt; trait-overload.rs:20:5&lt;br /&gt;   |&lt;br /&gt;20 |     fn size(&amp;self) {&lt;br /&gt;   |     ^^^^^^^^^^^^^^&lt;br /&gt;note: candidate #2 is defined in an impl of the trait `Metric` for the type `Hat`&lt;br /&gt;  --&gt; trait-overload.rs:10:5&lt;br /&gt;   |&lt;br /&gt;10 |     fn size(&amp;self) {&lt;br /&gt;   |     ^^^^^^^^^^^^^^&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What&#39;s going on here is that the compiler has no way of knowing which
version of &lt;code&gt;size()&lt;/code&gt; we want to call, because there are two equally
valid versions. In order to fix this, we need to disambiguate them,
and the compiler helpfully tells us how:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;help: disambiguate the method for candidate #1&lt;br /&gt;   |&lt;br /&gt;28 |     Imperial::size(&amp;h);&lt;br /&gt;   |     ~~~~~~~~~~~~~~~~~~&lt;br /&gt;help: disambiguate the method for candidate #2&lt;br /&gt;   |&lt;br /&gt;28 |     Metric::size(&amp;h);&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can get pretty far with Rust by just doing what the compiler
says, and if we do that, things work as expected:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;code-2&quot;&gt;Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#code-2&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size_cm&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;f64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;trait&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Metric&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Metric&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Metric: {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size_cm&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;trait&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Imperial&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Imperial&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Imperial: {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size_cm &lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; hat &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size_cm&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10.0&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token class-name&quot;&gt;Metric&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;hat&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token class-name&quot;&gt;Imperial&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;hat&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;output-2&quot;&gt;Output &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#output-2&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Metric: 10&lt;br /&gt;Imperial: 4&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you look closely, though, you&#39;ll notice something that we
had to pass a reference to &lt;code&gt;size()&lt;/code&gt;, as in &lt;code&gt;Metric::size(&amp;amp;hat)&lt;/code&gt;;
if we just change this to &lt;code&gt;Metric::size(hat)&lt;/code&gt; you get a compilation
error:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;  --&gt; trait-overload2.rs:28:18&lt;br /&gt;   |&lt;br /&gt;28 |     Metric::size(hat);&lt;br /&gt;   |     ------------ ^^^ expected `&amp;_`, found `Hat`&lt;br /&gt;   |     |&lt;br /&gt;   |     arguments to this function are incorrect&lt;br /&gt;   |&lt;br /&gt;   = note: expected reference `&amp;_`&lt;br /&gt;                 found struct `Hat`&lt;br /&gt;note: method defined here&lt;br /&gt;  --&gt; trait-overload2.rs:6:8&lt;br /&gt;   |&lt;br /&gt;6  |     fn size(&amp;self);&lt;br /&gt;   |        ^^^^&lt;br /&gt;help: consider borrowing here&lt;br /&gt;   |&lt;br /&gt;28 |     Metric::size(&amp;hat);&lt;br /&gt;   |                  +&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That&#39;s interesting: when we invoked a method with &lt;code&gt;.size()&lt;/code&gt;
it didn&#39;t matter whether we explicitly provided a reference
or an object, everything worked great. But when we invoke it
this way, we actually have to provide an argument that will
match the &lt;code&gt;self&lt;/code&gt; parameter, whether that&#39;s a value,
a reference, or a mutable reference, because we don&#39;t get
the magic behavior associated with &lt;code&gt;.&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;implementations-of-intoiterator&quot;&gt;Implementations of &lt;code&gt;IntoIterator&lt;/code&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#implementations-of-intoiterator&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We&#39;re now in a position to understand what&#39;s happening here.
When we do &lt;code&gt;IntoIterator::into_iter(x)&lt;/code&gt; this tells Rust to
expect a version of &lt;code&gt;into_iter()&lt;/code&gt; that takes a value argument
(i.e., &lt;code&gt;Vec&amp;lt;i32&amp;gt;&lt;/code&gt;)
which means we have to move &lt;code&gt;x&lt;/code&gt; into the function, so we
can&#39;t reuse it.&lt;/p&gt;
&lt;p&gt;It&#39;s a short step from there to understand why doing &lt;code&gt;for y in &amp;amp;x&lt;/code&gt;
works: there is also a version of &lt;code&gt;IntoIterator&lt;/code&gt; for &lt;code&gt;&amp;amp;Vec&amp;lt;i32&amp;gt;&lt;/code&gt;, so
if we call &lt;code&gt;IntoIterator::into_iter(&amp;amp;x)&lt;/code&gt; then that version gets invoked
(at this point, these are just totally different types from Rust&#39;s
perspective). Because that version just borrows &lt;code&gt;x&lt;/code&gt;, things work
fine with no double move.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;[Fixed &lt;code&gt;to_iter&lt;/code&gt; to be &lt;code&gt;into_iter&lt;/code&gt; -- 2025-05-26]&lt;/em&gt;.&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;code-3&quot;&gt;Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#code-3&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token macro property&quot;&gt;vec!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; y &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;x &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; x&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;output-3&quot;&gt;Output &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#output-3&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;1&lt;br /&gt;2&lt;br /&gt;2&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You might ask at this point why Rust doesn&#39;t instead just do &lt;code&gt;.into_iter()&lt;/code&gt;?
I can&#39;t find an explanation in the Rust documentation but I expect the
reason is that someone could implement another trait that provides
&lt;code&gt;.into_iter()&lt;/code&gt; on whatever the &lt;code&gt;x&lt;/code&gt; is in &lt;code&gt;for y in x&lt;/code&gt;, thus resulting
in a compiler error because there would be two candidate implementations.&lt;/p&gt;
&lt;h2 id=&quot;method-calls&quot;&gt;Method Calls &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#method-calls&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The next thing I want to look at is the impact of method calls.
This section uses the following (over)simplified model of a photo album
module to walk through the relevant issues:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token attribute attr-name&quot;&gt;#[derive(Debug, Clone)]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Photo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; label&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; content&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;label&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; content&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            label&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;from&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;label&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            content&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; _scale&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;f32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// In a real program this would adjust the&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// size, but here it just makes a copy.&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            label&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;label&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;-copy&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            content&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;content&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Album&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    photos&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Load photos from disk. Right now just a stub.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;_directory&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;Self&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; photos&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; i &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token comment&quot;&gt;// This is where we would load the content.&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; content &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;            album&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;photos&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token macro property&quot;&gt;format!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Image {i}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; content&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;        album&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; photo&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; index&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;index&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;photos&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;iter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token closure-params&quot;&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;photo&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;/span&gt; photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;label&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This should be easy to follow if you know any C-like language, even if you
don&#39;t know Rust, but just to orient you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A &lt;code&gt;Photo&lt;/code&gt; is a structure containing a label and some bytes that represent
the image (&lt;code&gt;content&lt;/code&gt;) (recall that I said this was oversimplified).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An &lt;code&gt;Album&lt;/code&gt; is a collection of photos.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Obviously, these &amp;quot;photos&amp;quot; are vacuous, in that they&#39;re just bytes, but
we&#39;re not going to be displaying them. Moreover, in
a real program, the album constructor (&lt;code&gt;init&lt;/code&gt;) would load the photos
from a directory, but in this case it just makes up 5 empty &lt;code&gt;Photos&lt;/code&gt;;
this is all fake, but the point here is just to have some scaffolding to
motivate/demonstrate the relevant issues.&lt;/p&gt;
&lt;p&gt;Now, consider this simple program:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; smaller_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;smaller_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Obviously, this does the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First, we create the photo album (again, this is supposed to be
loading photos from the disk).&lt;/li&gt;
&lt;li&gt;Make a new photo that is a scaled down version of the first photo.&lt;/li&gt;
&lt;li&gt;Add the new photo to the album.&lt;/li&gt;
&lt;li&gt;List the photos.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This program compiles and runs just fine, like so:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Photos [&quot;Image 0&quot;, &quot;Image 1&quot;, &quot;Image 2&quot;, &quot;Image 3&quot;, &quot;Image 4&quot;, &quot;Image 0-copy&quot;]&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So far so good. Now, let&#39;s make a trivial modification where we also
add a bigger photo. No problem, we&#39;ll just do some copy-and-paste
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Don%27t_repeat_yourself&amp;amp;oldid=1284247935&quot;&gt;DRY&lt;/a&gt;
be damned:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; smaller_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;smaller_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; bigger_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bigger_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Unfortunately, this totally doesn&#39;t work. Instead, we get the following
error.&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;error[E0502]: cannot borrow `album` as mutable because it is also borrowed as immutable&lt;br /&gt; --&gt; examples/ex2.rs:7:5&lt;br /&gt;  |&lt;br /&gt;5 |     let first_photo = album.get_photo(0);&lt;br /&gt;  |                       ----- immutable borrow occurs here&lt;br /&gt;6 |     let smaller_photo = first_photo.scale(0.1);&lt;br /&gt;7 |     album.add_photo(smaller_photo);&lt;br /&gt;  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here&lt;br /&gt;8 |     let bigger_photo = first_photo.scale(10.0);&lt;br /&gt;  |                        ----------- immutable borrow later used here&lt;br /&gt;&lt;br /&gt;For more information about this error, try `rustc --explain E0502`.&lt;br /&gt;error: could not compile `photos` (example &quot;ex2&quot;) due to 1 previous error&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;LOLWAT?&lt;/p&gt;
&lt;p&gt;The error message here is pretty good, but it&#39;s worth going through
what&#39;s happening:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;When we called &lt;code&gt;album.get_photo()&lt;/code&gt; the return value was
a reference to an individual photo in &lt;code&gt;album&lt;/code&gt;. In order to effectuate
this, Rust takes an immutable reference to &lt;code&gt;album&lt;/code&gt;, even though
it&#39;s actually just returning a reference to one of the photos.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When we now go to call &lt;code&gt;album.add_photo()&lt;/code&gt; we need to take
a mutable reference to &lt;code&gt;album&lt;/code&gt; in order to provide it as the
&lt;code&gt;&amp;amp;mut self&lt;/code&gt; argument to &lt;code&gt;album.add_photo()&lt;/code&gt;. However, because
we already have an immutable reference to &lt;code&gt;album&lt;/code&gt;, this is a double
borrow and the compiler generates an error.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;But wait, you say, I&#39;m doing exactly this in the first program, and
indeed you are. Let&#39;s look at these side by side:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;working&quot;&gt;Working &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#working&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; smaller_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;smaller_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;broken&quot;&gt;Broken &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#broken&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; smaller_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;smaller_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// &amp;lt;--- Double borrow here.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; bigger_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bigger_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Sure enough, the offending line is the &lt;em&gt;first&lt;/em&gt; call to &lt;code&gt;add_photo()&lt;/code&gt; which
was in the original code, not in the new code we added after. How can later
code break earlier code?&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/malcolm-tucker.jpg&quot; alt=&quot;Malcolm Tucker&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;The answer here is—surprise!—the borrow checker. What&#39;s
going on is that in the original code the last time we use &lt;code&gt;first_photo&lt;/code&gt;
is in the call to &lt;code&gt;.scale()&lt;/code&gt;, so even though it&#39;s &lt;em&gt;in scope&lt;/em&gt; when
we call &lt;code&gt;.add_photo()&lt;/code&gt; the borrow checker knows we&#39;re not going
to use it and so decides that it&#39;s not really live at the earlier
point, and so we don&#39;t have a double borrow. What causes the problem in the new code is that
we use &lt;code&gt;first_photo&lt;/code&gt; in the second call to &lt;code&gt;scale()&lt;/code&gt;, which means that
it has to be still be live when we call &lt;code&gt;add_photo()&lt;/code&gt;, resulting in
the double borrow error.&lt;/p&gt;
&lt;p&gt;OK, so we know the problem. How can we fix it? There are a number
of options.&lt;/p&gt;
&lt;h3 id=&quot;drop-and-re-borrow&quot;&gt;Drop and Re-borrow &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#drop-and-re-borrow&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The easiest thing to do is to invalidate &lt;code&gt;first_photo&lt;/code&gt;
by dropping &lt;code&gt;first_photo&lt;/code&gt; and reacquiring it after
we call &lt;code&gt;.add_photo()&lt;/code&gt;, as shown below:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Acquire `first_photo` the first time&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; smaller_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;smaller_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Re-acquire `first_photo`&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; bigger_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bigger_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&#39;ve done the Rust idiomatic thing here and reused the name
&lt;code&gt;first_photo&lt;/code&gt;, thus &lt;em&gt;shadowing&lt;/em&gt; the original variable, and you might
think that that&#39;s important, but this actually isn&#39;t necessary because
the Rust compiler can infer when a variable is being used, as we saw
before.  It works just as well if you name the new variable
&lt;code&gt;first_photo2&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Re-acquiring &lt;code&gt;first_photo&lt;/code&gt; is a reasonable approach in this
case because finding the photo is just a matter of looking up
the first value in the &lt;code&gt;.photos&lt;/code&gt; vector in &lt;code&gt;album&lt;/code&gt; and vector
lookups are fast. However, imagine that instead we had to
do some expensive operation that involved examining all
the photos in the album. Imagine there was an API that
asked for the photo of the cutest cat. Clearly we wouldn&#39;t want to do
that computation again!&lt;/p&gt;
&lt;h3 id=&quot;store-a-handle&quot;&gt;Store a Handle &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#store-a-handle&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If we were using some expensive API to find the photo, then we
need to find some way to avoid paying that cost for each photo
we want to transform. One way to handle that is to have that
API return a &lt;em&gt;handle&lt;/em&gt; to the photo rather than the photo itself.
The obvious thing to do here is to have the handle just be
the index in the array, like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;get_cutest_cat&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You could then store the index and call &lt;code&gt;get_photo()&lt;/code&gt; repeatedly,
as in the previous example, thus amortizing the expensive operation
and repeating the cheap one.&lt;/p&gt;
&lt;p&gt;This will work as long as &lt;code&gt;add_photo()&lt;/code&gt; doesn&#39;t invalidate
the handle. In this case, it doesn&#39;t because we add photos
to the end of the vector, but if we inserted them at
the front, then it would shift our photo up by one,
invalidating the handle. Note that this isn&#39;t something
that would be caught by the compiler; it just causes
a correctness error because the second time through we
try to resize the previous photo rather than the one
we intended. Deleting a photo would have a similar
problem. Note that no matter how badly you screw up,
this won&#39;t cause a memory error because Rust won&#39;t let
you index outside of the array; it&#39;s just a correctness
issue, but that doesn&#39;t mean it&#39;s not serious.&lt;/p&gt;
&lt;h3 id=&quot;make-a-copy&quot;&gt;Make a Copy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#make-a-copy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Alternatively, we can make a copy of the photo. Fortunately,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
&lt;code&gt;Photo&lt;/code&gt; implements &lt;code&gt;Clone&lt;/code&gt;, so this is straightforward:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Code changed here.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; smaller_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;smaller_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; bigger_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bigger_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The obvious problem here is that the &lt;code&gt;.clone()&lt;/code&gt; is actually moderately
expensive: we need to allocate enough space for a new copy of the image
and then copy the image data over. That&#39;s a lot of work to solve a
simple problem. Worse yet, this solution isn&#39;t always available,
as we might be working with a type that didn&#39;t implement &lt;code&gt;Clone&lt;/code&gt;
or that wasn&#39;t in principle cloneable, for instance because it
was holding some external resource like a file. Nevertheless, this
is a common approach.&lt;/p&gt;
&lt;h3 id=&quot;restructure-the-code&quot;&gt;Restructure the Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#restructure-the-code&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The final approach available to us is to restructure the code a bit
so that the lifetime of &lt;code&gt;first_photo&lt;/code&gt; doesn&#39;t overlap the calls
to &lt;code&gt;.add_photo()&lt;/code&gt;. In this case, this is a fairly simple matter
of computing both &lt;code&gt;smaller_photo&lt;/code&gt; and &lt;code&gt;bigger_photo&lt;/code&gt; and then adding
them both, like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; smaller_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; bigger_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;smaller_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bigger_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is probably the most idiomatic thing to do in this specific case in that
it doesn&#39;t have the negative performance effects of the previous
options and doesn&#39;t require any changes to the API as the last
two options potentially do (making &lt;code&gt;Photo&lt;/code&gt; &lt;code&gt;Clone&lt;/code&gt; or adding
a handle API). However, it&#39;s also a lot more disruptive to the
logic of the code.  Suppose that you wanted
to use a loop to generate images of various size, like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; sizes &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; size &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; sizes &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; new_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;new_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have to aggregate &lt;em&gt;all&lt;/em&gt; of the modified versions of
the original photo and then add them all at once, like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; sizes &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; new_photos &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; size &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; sizes &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; new_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        new_photos&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;new_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; photo &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; new_photos &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At one level, this is just irritating, in that we have to restructure
the code. But because we&#39;re having to store the transformed
photos in memory we&#39;re potentially increasing the memory footprint
of the program significantly. That&#39;s not the case here because
&lt;code&gt;add_photo()&lt;/code&gt; just moves a photo from the temporary vector to the vector
in &lt;code&gt;album&lt;/code&gt;,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
but if &lt;code&gt;Album&lt;/code&gt; stored photos on disk, we could run out
of memory in the first loop whereas if we stored photos right
away that wouldn&#39;t happen. In this situation you would have
to use one of the other approaches.&lt;/p&gt;
&lt;h4 id=&quot;non-lexical-lifetimes&quot;&gt;Non-Lexical Lifetimes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#non-lexical-lifetimes&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;I said above that Rust could infer that &lt;code&gt;first_photo&lt;/code&gt;
wasn&#39;t in use and therefore it didn&#39;t count as a reference
for the purposes of the borrowing rules. This didn&#39;t used
to be true. In older versions of Rust, the fact that
the variable existed was enough to keep the reference
alive, whether it was subsequently used or not. So,
for instance, if we go back to our original code,
we would have a double-borrow problem because &lt;code&gt;first_photo&lt;/code&gt;
is still in scope through the end of the function. Instead,
you would have had to do something like this:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; smaller_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// first_photo dropped here&lt;/span&gt;&lt;br /&gt;    album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;add_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;smaller_photo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Photos {:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;list_photos&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Putting &lt;code&gt;first_photo&lt;/code&gt; in a braced block like this causes it to be
explicitly dropped so that when &lt;code&gt;add_photo&lt;/code&gt; needs to borrow &lt;code&gt;&amp;amp;mut self&lt;/code&gt; it&#39;s not a double borrow. This was obviously a pain in the ass
and Rust eventually added a feature called &lt;a href=&quot;https://blog.rust-lang.org/2022/08/05/nll-by-default.html&quot;&gt;non-lexical
lifetimes&lt;/a&gt;
which made the compiler smarter about knowing when references were
really live.&lt;/p&gt;
&lt;p&gt;One thing to notice is that the original code was &lt;em&gt;always&lt;/em&gt;
safe, it&#39;s just that the compiler didn&#39;t realize it.
Rust&#39;s borrow checker is &lt;em&gt;conservative&lt;/em&gt; in that it will only
accept code it can prove is safe, but it will also reject
code which is actually safe but the borrow checker can&#39;t prove
is safe. This leaves room for improvements in the language
as the borrow checker gets smarter and constructs which would
previously have been errors—but were actually safe—become
allowed.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;why%2C-oh-why%3F&quot;&gt;Why, oh why? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#why%2C-oh-why%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At this point, you might want to ask why Rust is torturing you
like this? Why can&#39;t I just do what I want to, like in good ol&#39; C++? And at first glance,
it looks like the original double borrow code is safe. After all, we&#39;re not &lt;em&gt;using&lt;/em&gt;
&lt;code&gt;first_photo&lt;/code&gt; simultaneously with &lt;code&gt;album.add_photo()&lt;/code&gt;, it&#39;s just
sitting there waiting for us to use it again.&lt;/p&gt;
&lt;p&gt;But actually what&#39;s happening here is the same problem we saw
in the &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#mutable-and-immutable-references&quot;&gt;previous post&lt;/a&gt;:
&lt;code&gt;first_photo&lt;/code&gt; is a reference (a pointer) to an element in the
array, as shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/photos-reference-1.png&quot; alt=&quot;first_photo is a reference&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;&lt;code&gt;first_photo&lt;/code&gt; is a reference&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If we add a new photo and that causes the memory allocated to
the array or vector to resize (this can happen even if we
just add an element to the end), then suddenly &lt;code&gt;first_photo&lt;/code&gt; is
pointing to an unallocated region of memory, as shown below.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/photos-reference-2.png&quot; alt=&quot;after resize&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;&lt;code&gt;first_photo&lt;/code&gt; after a resize&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If Rust is going to be safe, it can&#39;t allow this, so the code
won&#39;t compile.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;safer-handles&quot;&gt;Safer Handles &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#safer-handles&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As an aside, it&#39;s somewhat possible to write handles in a way that
is safer than just a simple integer. For example, we could
have the handle store not just the index of the element
but also some identifier for the contents of the element
in such a way that the handle would become invalid if the
element changed. In this case, the result would be that
if an element were inserted before the photo, shifting the
elements to the right, an attempt to dereference the handle
would fail, so you&#39;d get a runtime error.&lt;/p&gt;
&lt;p&gt;Probably a better approach in this case is to replace
a generic handle with a query which caches its results.
For instance, we could have &lt;code&gt;find_cutest_cat()&lt;/code&gt; remember
the current cutest cat and which photos it had looked at
and then when you ask for the result, it just looks at any
new pictures of cats to compare them to the current cutest;
this is far more robust than trying to build some kind of
safer handle structure.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&#39;s important to realize that the &lt;em&gt;logical&lt;/em&gt; situation is the
same as with handles: we have a reference (in the general sense,
not the Rust technical sense), which is now invalid. The
difference is that the handle (in this case an integer) is
an offset into the vector, as opposed to the address of
some random region in memory. This means that only one of
two things can happen:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The handle is smaller than the size of the array and so
there&#39;s an element at the relevant location, just
not the one that&#39;s expected. This causes the program
to silently malfunction.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The handle is greater than or equal to the size of the
array, in which case you&#39;ll get a runtime error right
away.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In neither case do you have a memory error. This nicely illustrates the sense in which Rust is &amp;quot;safe&amp;quot;,
namely that it prevents you from memory errors but not logic
errors (though it does protect against some, as seen below).
It&#39;s still very possible to write bugs in Rust; it&#39;s just
that they don&#39;t result in memory corruption.&lt;/p&gt;
&lt;h2 id=&quot;lifetimes&quot;&gt;Lifetimes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#lifetimes&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In the previous section I just glossed over something tricky. Let&#39;s
take another look at &lt;code&gt;Album::get_photo()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; index&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;index&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this code I&#39;m taking a reference to &lt;code&gt;&amp;amp;self.photos[index]&lt;/code&gt; and
returning it, but what makes this safe? Suppose that &lt;code&gt;album&lt;/code&gt; gets
deleted while I&#39;m still hanging on to the return value. Don&#39;t
I get a dangling reference. Let&#39;s try it and see.&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;photos&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token attribute attr-name&quot;&gt;#[allow(unused_mut)]&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; album &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Album&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;directory&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        first_photo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; album&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; _ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; first_photo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Reassuringly, this won&#39;t compile, producing the following
error:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;error[E0597]: `album` does not live long enough&lt;br /&gt;  --&gt; examples/ex4.rs:8:23&lt;br /&gt;   |&lt;br /&gt;7  |         let mut album = Album::init(&amp;&quot;directory&quot;);&lt;br /&gt;   |             --------- binding `album` declared here&lt;br /&gt;8  |         first_photo = album.get_photo(0);&lt;br /&gt;   |                       ^^^^^ borrowed value does not live long enough&lt;br /&gt;9  |     }&lt;br /&gt;   |     - `album` dropped here while still borrowed&lt;br /&gt;10 |     let _ = first_photo.scale(0.1);&lt;br /&gt;   |             ----------- borrow later used here&lt;br /&gt;&lt;br /&gt;For more information about this error, try `rustc --explain E0597`.&lt;br /&gt;error: could not compile `photos` (example &quot;ex4&quot;) due to 1 previous error&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Dodged a bullet there. After we&#39;ve taken a deep breath because Rust saved us from ourselves,
we might start to wonder what is actually going on here: how
does Rust know that &lt;code&gt;first_photo&lt;/code&gt; is a borrow of &lt;code&gt;album&lt;/code&gt;? That&#39;s
information that is only available by looking at the implementation
of &lt;code&gt;get_photo()&lt;/code&gt; and remember what I said about local reasoning?&lt;/p&gt;
&lt;p&gt;Understanding what is going on here requires understanding what
Rust calls &amp;quot;lifetimes&amp;quot;. Let&#39;s start with the basic rule that Rust
enforces.&lt;/p&gt;
&lt;center&gt;
&lt;p&gt;&lt;em&gt;If &lt;code&gt;B&lt;/code&gt; is a reference to object &lt;code&gt;A&lt;/code&gt; then &lt;code&gt;B&lt;/code&gt; can&#39;t outlive object &lt;code&gt;A&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/center&gt;
&lt;p&gt;Obviously, what&#39;s gone wrong here is that &lt;code&gt;album&lt;/code&gt; goes out of scope at the end
of the block enclosing it, at which point the reference to &lt;code&gt;album&lt;/code&gt;
in &lt;code&gt;first_photo&lt;/code&gt; is invalid. I.e., it has &lt;em&gt;outlived&lt;/em&gt; &lt;code&gt;album&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;lifetime&lt;/em&gt; of a variable is the time between when it&#39;s first
created and when it&#39;s last used. So, another way of stating the
above rule is that:&lt;/p&gt;
&lt;center&gt;
&lt;p&gt;&lt;em&gt;If &lt;code&gt;B&lt;/code&gt; is a reference to object &lt;code&gt;A&lt;/code&gt; then &lt;code&gt;B&lt;/code&gt;&#39;s lifetime must be contained
within &lt;code&gt;A&lt;/code&gt;&#39;s lifetime (though they can be coextensive).&lt;/em&gt;&lt;/p&gt;
&lt;/center&gt;
&lt;p&gt;Just to see the problem more clearly here, I&#39;ve annotated the code
to show the relevant lifetimes.
The annotated code below shows the lifetime of &lt;code&gt;album&lt;/code&gt; and &lt;code&gt;first_photo&lt;/code&gt;
in our working code. As you can see, &lt;code&gt;first_photo&lt;/code&gt; is last used
before the end of the block, which is when &lt;code&gt;album&lt;/code&gt; goes out of scope (and
hence the end of its lifetime).&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;use photos::photos::*;&lt;br /&gt;&lt;br /&gt;pub fn main() {&lt;br /&gt;    let mut album = Album::init(&amp;&quot;directory&quot;); &lt;---------------+&lt;br /&gt;    let first_photo = album.get_photo(0);      &lt;-&#92; Lifetime of | Lifetime of&lt;br /&gt;    let smaller_photo = first_photo.scale(0.1);&lt;-/ first_photo | album&lt;br /&gt;    album.add_photo(smaller_photo);                            |&lt;br /&gt;    println!(&quot;Photos {:?}&quot;, album.list_photos());              |&lt;br /&gt;}                                              &lt;---------------+ &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now compare the annotated version of the broken code:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;use photos::photos::*;&lt;br /&gt;&lt;br /&gt;pub fn main() {&lt;br /&gt;    let first_photo;&lt;br /&gt;    {&lt;br /&gt;        #[allow(unused_mut)]&lt;br /&gt;        let mut album = Album::init(&amp;&quot;directory&quot;); &lt;-+ Lifetime&lt;br /&gt;        first_photo = album.get_photo(0);            | of album  &lt;-+&lt;br /&gt;    }                                              &lt;-+             | Lifetime of &lt;br /&gt;    let _ = first_photo.scale(0.1);                &lt;---------------+ first_photo&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As before, &lt;code&gt;album&lt;/code&gt;&#39;s lifetime ends at the end of the enclosing block,
but in this case, &lt;code&gt;first_photo&lt;/code&gt; is used after that point, so its lifetime
extends past the end of &lt;code&gt;album&lt;/code&gt;, which, as noted before, is forbidden.
Note that &lt;code&gt;first_photo&lt;/code&gt; &lt;em&gt;exists&lt;/em&gt; before it is first assigned to
point to &lt;code&gt;album&lt;/code&gt;, but it&#39;s not a reference to &lt;code&gt;album&lt;/code&gt;. Actually,
in this case it&#39;s not assigned to anything, and so using it would
be forbidden prior to assignment.&lt;/p&gt;
&lt;p&gt;This brings us back to the question I asked above: how does Rust know
that &lt;code&gt;first_photo&lt;/code&gt; is a borrow of &lt;code&gt;album&lt;/code&gt; and not of something else?
And what if I did want to borrow something else?&lt;/p&gt;
&lt;h3 id=&quot;seeing-like-a-compiler&quot;&gt;Seeing Like a Compiler &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#seeing-like-a-compiler&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Although every variable in Rust has a lifetime, so far we&#39;ve managed
to avoid dealing with that because the compiler can often infer
those lifetimes and act appropriately. However, there are situations
where that&#39;s not the case.&lt;/p&gt;
&lt;p&gt;Let&#39;s start with a simple example to get the idea:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;first&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; second&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    first&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;first&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;second&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is some pretty obvious code: we pass two string references
into &lt;code&gt;return_first()&lt;/code&gt; and it returns the first one. But when
we try to compile it we get an error:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;error[E0106]: missing lifetime specifier&lt;br /&gt; --&gt; examples/ex5.rs:1:47&lt;br /&gt;  |&lt;br /&gt;1 | fn return_first(first: &amp;str, second: &amp;str) -&gt; &amp;str {&lt;br /&gt;  |                        ----          ----     ^ expected named lifetime parameter&lt;br /&gt;  |&lt;br /&gt;  = help: this function&#39;s return type contains a borrowed value, but the signature does not say whether it is borrowed from `first` or `second`&lt;br /&gt;help: consider introducing a named lifetime parameter&lt;br /&gt;  |&lt;br /&gt;1 | fn return_first&lt;&#39;a&gt;(first: &amp;&#39;a str, second: &amp;&#39;a str) -&gt; &amp;&#39;a str {&lt;br /&gt;  |                ++++         ++               ++          ++&lt;br /&gt;&lt;br /&gt;For more information about this error, try `rustc --explain E0106`.&lt;br /&gt;error: could not compile `photos` (example &quot;ex5&quot;) due to 1 previous error&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Rust error messages are usually clearer than this, but what&#39;s
going on is that the compiler isn&#39;t able to verify that the
lifetimes here follow the rules.
Specifically, the return value of &lt;code&gt;return_first()&lt;/code&gt; is a reference to
something, but the compiler doesn&#39;t know how long it&#39;s supposed to be
valid for. This will cause a problem when we try to use &lt;code&gt;println!()&lt;/code&gt;
on it, because Rust doesn&#39;t know if it&#39;s safe to use in that context.
We are able to examine the function and realize it&#39;s safe, but
because the compiler wants to use local reasoning, it&#39;s not able
to do so.&lt;/p&gt;
&lt;p&gt;What I mean by local reasoning is when checking &lt;code&gt;main()&lt;/code&gt;
from the compiler&#39;s perspective, at this point the program looks like this:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;first&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; second&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// STUFF WE WON&#39;T LOOK AT.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;first&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;second&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When checking to see if &lt;code&gt;main()&lt;/code&gt; is following the lifetime rules,
the compiler wants to use only the information it has available
from the function &lt;em&gt;signature&lt;/em&gt;, without looking at the implementation.
Conversely, when it checks &lt;code&gt;return_first()&lt;/code&gt; it won&#39;t look at &lt;code&gt;main()&lt;/code&gt;.
If you&#39;re a C or C++ programmer, this should be conceptually familiar
because C and C++ programs have header (&lt;code&gt;.h&lt;/code&gt;) files which conventionally
contain function and method signatures, with the implementation
(the body) living in &lt;code&gt;.c&lt;/code&gt; or &lt;code&gt;.cc&lt;/code&gt; (or &lt;code&gt;.cpp&lt;/code&gt; or &lt;code&gt;.c++&lt;/code&gt;) files.
This allows the compiler to compile one file (technical term: &amp;quot;translation unit&amp;quot;)
without knowing how another file works, but only the interfaces it
provides. Rust doesn&#39;t have a header/body split like C and C++
but you can still get into the same situation if you are operating
on a &amp;quot;trait object&amp;quot; (the Rust equivalent of C++ virtual functions),
because you only know the trait definition.&lt;/p&gt;
&lt;p&gt;In order to make this code compile, we have to help the compiler
out by telling it the &lt;em&gt;expected&lt;/em&gt; lifetime of the return value.
We do this by decorating variables with a lifetime annotation,
which looks like &lt;code&gt;&#39;a&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
Specifically, what we need to express here
is that the return value is not expected to outlive the first
argument and thus it&#39;s safe to use the return value as long as
the first argument is also alive. The notation for this looks
like:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;a&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;first&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; second&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note how the notation here looks a little like C++ templates
(and Rust generics, which we didn&#39;t go into as much), because
this is a kind of generic. You read this line as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There is some lifetime &lt;code&gt;&#39;a&lt;/code&gt; such that the return value is valid
during &lt;code&gt;&#39;a&lt;/code&gt; (and can&#39;t be safely used after) and that whatever
&lt;code&gt;first&lt;/code&gt; is pointing to lives
at least as long as &lt;code&gt;&#39;a&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The way to look at these lifetime annotations is that they are
defining the &lt;em&gt;contract&lt;/em&gt; for this function. In order to enforce
that contract, the compiler does two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Analyzes the caller of the function to verify that it isn&#39;t
using the return value outside of the lifetime of whatever
it passed as the first argument. As noted above, it can do
this without looking at the function body.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Analyzes the body of the function to verify that the return
value is actually derived from the first argument, so that
it will be safe as long as the first argument is valid.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We&#39;ve seen the first check in action, but let&#39;s look at the
second check. Consider what happens if we change the return
value to be derived from the second argument:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;first&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; second&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    second&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// println!(&quot;{:?}&quot;, return_first(&amp;amp;&quot;first&quot;, &amp;amp;&quot;second&quot;));&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&#39;ve commented out the call to &lt;code&gt;return_first()&lt;/code&gt; so we&#39;re not even using
the return value, but we still get an error because the
function body isn&#39;t fulfilling the contract:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;error[E0106]: missing lifetime specifier&lt;br /&gt; --&gt; examples/ex6.rs:1:47&lt;br /&gt;  |&lt;br /&gt;1 | fn return_first(first: &amp;str, second: &amp;str) -&gt; &amp;str {&lt;br /&gt;  |                        ----          ----     ^ expected named lifetime parameter&lt;br /&gt;  |&lt;br /&gt;  = help: this function&#39;s return type contains a borrowed value, but the signature does not say whether it is borrowed from `first` or `second`&lt;br /&gt;help: consider introducing a named lifetime parameter&lt;br /&gt;  |&lt;br /&gt;1 | fn return_first&lt;&#39;a&gt;(first: &amp;&#39;a str, second: &amp;&#39;a str) -&gt; &amp;&#39;a str {&lt;br /&gt;  |                ++++         ++               ++          ++&lt;br /&gt;&lt;br /&gt;For more information about this error, try `rustc --explain E0106`.&lt;br /&gt;error: could not compile `photos` (example &quot;ex6&quot;) due to 1 previous error&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As illustrated here, it&#39;s not possible to produce unsafe code by
giving the compiler the wrong lifetime: if we screw up
the compiler will throw an error. In this sense, lifetimes are
just a hint to the compiler and a sufficiently smart compiler could
do without them.&lt;/p&gt;
&lt;h3 id=&quot;lifetime-elision&quot;&gt;Lifetime Elision &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#lifetime-elision&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In fact, it&#39;s because lifetimes are a kind of hint that our original photo handling
code works without lifetime annotations. To see this, consider
the following trivial modification of this program in which
we only pass in one argument:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;fn return_first(first: &amp;amp;str) -&amp;gt; &amp;amp;str {
    first
}

fn main() {
    println!(&amp;quot;{:?}&amp;quot;, return_first(&amp;amp;&amp;quot;first&amp;quot;));
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The compiler will process this just fine because it contains
a set of &lt;a href=&quot;https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-elision&quot;&gt;default rules&lt;/a&gt;
(&amp;quot;lifetime elision&amp;quot;) that handle common cases. The rule that is applicable to this
case are:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The second rule is that, if there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters: fn foo&amp;lt;&#39;a&amp;gt;(x: &amp;amp;&#39;a i32) -&amp;gt; &amp;amp;&#39;a i32.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In other words, Rust is secretly changing the function signature
to be:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;first&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is reasonable because the vast majority of—but not all—valid code that
has this kind of signature will be returning a reference to
something derived from one of the arguments.
However, once again, this is just a default: if we were to change &lt;code&gt;return_first()&lt;/code&gt; to
return a reference to something not derived from &lt;code&gt;first&lt;/code&gt;, then the compiler
would generate an error. First, consider the following:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;first&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;from&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;first&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;s&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, we are returning a dangling reference to &lt;code&gt;s&lt;/code&gt;,
which only lives to the end of the function. This is plainly
illegal—in fact, this is exactly what Rust lifetimes
are designed to prevent—and so the compiler returns
an error. No amount of lifetime decorations will make it
compile.&lt;/p&gt;
&lt;h3 id=&quot;multiple-arguments&quot;&gt;Multiple Arguments &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#multiple-arguments&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Functions with a single argument are easy mode. For functions with
multiple arguments, Rust will assign each one its own
lifetime (rule one), at which point it doesn&#39;t know which lifetime to
associate the return value with. This is why the version above with two arguments
doesn&#39;t work without lifetime annotations, because Rust
is internally giving it the following signature:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;return_first&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;b&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;c&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;first&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; second&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;b&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;c&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because it doesn&#39;t know the lifetime of &lt;code&gt;&#39;c&lt;/code&gt;, the compiler is
not able to determine either whether the return value is being
used safely at the call site. We can resolve this issue
as above by explicitly labeling the return value with a lifetime
matching one of the arguments.  To take one of the &lt;a href=&quot;https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotations-in-function-signatures&quot;&gt;examples&lt;/a&gt; from the Rust book, if the function might
return either &lt;code&gt;first&lt;/code&gt; or &lt;code&gt;second&lt;/code&gt; then you need to
attach the same lifetime to both arguments, with the
result that Rust will verify safety for whatever lifetime
is shorter (recall that &lt;code&gt;&#39;a&lt;/code&gt; only has to be a lifetime that
satisfies all the constraints).&lt;/p&gt;
&lt;h3 id=&quot;member-functions&quot;&gt;Member Functions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#member-functions&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There is one more special case: when you have a method call,
then Rust assumes that the lifetime of any references returned
will be the same as &lt;code&gt;self&lt;/code&gt;. Turning back to &lt;code&gt;get_photo()&lt;/code&gt;, this
means that Rust is internally assigning the lifetime.&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;get_photo&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; index&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Photo&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you wanted a member function to return another argument,
you would need to explicitly annotate the function, like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;get_stuff&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;str&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;structs&quot;&gt;Structs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#structs&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There&#39;s one more case worth covering: structs can have members
that are references, in which case you have to provide a lifetime
for the reference. This looks like:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    x&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The semantics of this are the same as having a variable with the same
reference label: the lifetime of &lt;code&gt;x&lt;/code&gt; and hence &lt;code&gt;Foo&lt;/code&gt; has to be shorter
than the lifetime of whatever &lt;code&gt;x&lt;/code&gt; is a reference to. This all
works, but things start to get hairy pretty fast because you have to
decorate a lot of stuff with the lifetimes, as in:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token attribute attr-name&quot;&gt;#[derive(Debug)]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Pair&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    x&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    y&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Pair&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Pair&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token class-name&quot;&gt;Pair&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; x&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let&#39;s try one more thing. Check out the following code:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token attribute attr-name&quot;&gt;#[derive(Debug)]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Holder&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    t&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Option&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Holder&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;set_value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; holder &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Holder&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;None&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; tmp&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        holder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set_value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;tmp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;holder&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Just to orient you, this code defines a struct called &lt;code&gt;Holder&lt;/code&gt;. It&#39;s a
generic type parameterized on &lt;code&gt;T&lt;/code&gt; so that it can hold an instance of any &lt;code&gt;T&lt;/code&gt;. The
actual member value (&lt;code&gt;t&lt;/code&gt;) is an &lt;code&gt;Option&amp;lt;T&amp;gt;&lt;/code&gt; so that we can create
an empty &lt;code&gt;Holder&lt;/code&gt; and then fill it with &lt;code&gt;.set_value()&lt;/code&gt;—or at least
in principle can fill it with &lt;code&gt;.set_value()&lt;/code&gt;. In practice, &lt;code&gt;.set_value()&lt;/code&gt;
has the signature you would expect from the name, but doesn&#39;t actually do anything,
so &lt;code&gt;Holder&lt;/code&gt; always contains a &lt;code&gt;None&lt;/code&gt;. You have to use this &lt;code&gt;Option&lt;/code&gt;
trick a lot in Rust because there&#39;s no way to have empty object
references like C++ &lt;code&gt;nullptr&lt;/code&gt; (or, arguably, that&#39;s what &lt;code&gt;Option&lt;/code&gt; is for).&lt;/p&gt;
&lt;p&gt;If we call &lt;code&gt;.set_value()&lt;/code&gt; with an instance of &lt;code&gt;i32&lt;/code&gt;, then everything
works as expected. The program compiles and outputs:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Holder { t: None }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that even though this is a generic, we didn&#39;t need to tell
Rust which type to instantiate it (Rust jargon: monomorphize) with.
Instead, it inferred it from the fact that we called &lt;code&gt;.set_value()&lt;/code&gt; with
a type of &lt;code&gt;i32&lt;/code&gt;: &lt;code&gt;set_value()&lt;/code&gt; is defined as taking an argument of
type &lt;code&gt;T&lt;/code&gt; and thus this means we must have a &lt;code&gt;Holder&amp;lt;i32&amp;gt;&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Now let&#39;s do the exact same thing but but with one small change:
pass &lt;code&gt;&amp;amp;tmp&lt;/code&gt; to &lt;code&gt;holder.set_value()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token attribute attr-name&quot;&gt;#[derive(Debug)]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Holder&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    t&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Option&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Holder&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;set_value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; holder &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Holder&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;None&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; tmp&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        holder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set_value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;tmp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;holder&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Surprise (or maybe not?)! This doesn&#39;t compile at all.&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;warning: unused variable: `t`&lt;br /&gt; --&gt; examples/ex10-ref.rs:7:29&lt;br /&gt;  |&lt;br /&gt;7 |     fn set_value(&amp;mut self, t: T) {}&lt;br /&gt;  |                             ^ help: if this is intentional, prefix it with an underscore: `_t`&lt;br /&gt;  |&lt;br /&gt;  = note: `#[warn(unused_variables)]` on by default&lt;br /&gt;&lt;br /&gt;error[E0597]: `tmp` does not live long enough&lt;br /&gt;  --&gt; examples/ex10-ref.rs:14:26&lt;br /&gt;   |&lt;br /&gt;13 |         let tmp: i32 = 10;&lt;br /&gt;   |             --- binding `tmp` declared here&lt;br /&gt;14 |         holder.set_value(&amp;tmp);&lt;br /&gt;   |                          ^^^^ borrowed value does not live long enough&lt;br /&gt;15 |     }&lt;br /&gt;   |     - `tmp` dropped here while still borrowed&lt;br /&gt;16 |     println!(&quot;{:?}&quot;, &amp;holder);&lt;br /&gt;   |                      ------- borrow later used here&lt;br /&gt;&lt;br /&gt;For more information about this error, try `rustc --explain E0597`.&lt;br /&gt;error: could not compile `photos` (example &quot;ex10-ref&quot;) due to 1 previous error; 1 warning emitted&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&#39;re now seeing the downstream consequences of the lifetime
annotations for structs. As mentioned above, if you have a reference
member in a struct, it needs a lifetime, so when we monomorphized
&lt;code&gt;Holder&lt;/code&gt; with an &lt;code&gt;i32&lt;/code&gt;, it had to associate a lifetime with it,
so we ended up with something like:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Holder&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    t&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Option&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;a&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And when we called &lt;code&gt;.set_value(&amp;amp;tmp)&lt;/code&gt;, &lt;code&gt;&#39;a&lt;/code&gt; got associated with
the lifetime of &lt;code&gt;tmp&lt;/code&gt;. That lifetime ends when the block ends,
but &lt;code&gt;holder&lt;/code&gt; extends past the end of the block, so we have
a lifetime error. You&#39;ll note that we didn&#39;t even have to use
the reference passed to &lt;code&gt;.set_value()&lt;/code&gt; to make this happen:
Rust just looked at the function signature and decided that
in principle we &lt;em&gt;could&lt;/em&gt; be using it and so that meant
&lt;code&gt;holder&lt;/code&gt; had to be treated as if it were holding a reference
to &lt;code&gt;tmp&lt;/code&gt;. If we change &lt;code&gt;.set_value()&lt;/code&gt; to take a &lt;code&gt;&amp;amp;self&lt;/code&gt; instead
of a &lt;code&gt;&amp;amp;self&lt;/code&gt; (this is fine because we&#39;re not touching
&lt;code&gt;self.t&lt;/code&gt; anyway), then the problem resolves itself and the
program will compile just fine.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;bonus%3A-thread-safety&quot;&gt;Bonus: Thread Safety &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#bonus%3A-thread-safety&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Everything in this post has been about memory allocation, but here&#39;s
the cool part: Rust also provides thread safety, mostly through
the same mechanisms that provide memory safety. This post is
already quite long, but I want to briefly give you an intuition of
how this works.&lt;/p&gt;
&lt;p&gt;The basic cause of thread safety issues in software is when you
have the same data value being modified by two threads at once.
Consider the following trivial function for a bank&#39;s accounting
system:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;pay_money&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;payee&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; amount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;balance &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; amount&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;     &lt;span class=&quot;token keyword&quot;&gt;throw&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Insufficient funds&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  balance &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; balance &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; amount&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;send_money&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;payee&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; amount&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;                                       &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This looks like perfectly reasonable code, but what happens if
we decide to run it in a multithreaded program where requests
to pay people can come in in parallel. Suddenly, we have a serious
problem because the individual steps of these threads can
execute in any order. For instance, we might have the following
order of execution:&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Thread 1                                  Thread 2&lt;br /&gt;&lt;br /&gt;function pay_money(payee, amount) {       &lt;br /&gt;  if (balance &lt; amount) {                 &lt;br /&gt;     throw Error(&quot;Insufficient funds&quot;);   &lt;br /&gt;  }                                       &lt;br /&gt;                                          function pay_money(payee, amount) {     &lt;br /&gt;                                            if (balance &lt; amount) {               &lt;br /&gt;                                               throw Error(&quot;Insufficient funds&quot;); &lt;br /&gt;                                            }                                     &lt;br /&gt;                                            balance = balance - amount;           &lt;br /&gt;                                            send_money(payee, amount);            &lt;br /&gt;                                          }                                       &lt;br /&gt;  balance = balance - amount;&lt;br /&gt;  send_money(payee, amount);            &lt;br /&gt;}                                       &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will (maybe) work fine sometimes but what happens if we have $100
in the account and then we get two transactions for $75 each?  The
obvious thing to expect is that we end up with $-50 dollars in the
account—or, depending on the language, maybe
&lt;code&gt;$4294967246&lt;/code&gt; dollars, oops!—because thread 1 checks the balance prior to thread 2 debiting
it. This is what&#39;s called a &amp;quot;time of check time of use&amp;quot; bug.&lt;/p&gt;
&lt;p&gt;Actually the situation is much much worse than this because
compiler output isn&#39;t really anywhere near as neat as I&#39;ve suggested here, so all
sorts of things could happen. For example, the compiler during the
initial read of &lt;code&gt;balance&lt;/code&gt; the compiler could decide to store the
value of &lt;code&gt;balance&lt;/code&gt; in a register and then write it back to balance
from that register, with the result that the first write is lost.
To take a more extreme example, the compiler can move values in registers
&lt;em&gt;into&lt;/em&gt; your variables if it wants to (this is called a &amp;quot;register spill&amp;quot;)
as long as it restores them afterwards; this can cause obvious problems
if you then contaminate one of the values it&#39;s using.
This great &lt;a href=&quot;https://web.archive.org/web/20170316072356/https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong&quot;&gt;post&lt;/a&gt; by Dmitry Vyukov) goes into a lot more detail here, but the basic
point is that if you ever do uncoordinated writes to the same
data values it&#39;s incredibly bad news (again, it&#39;s undefined
behavior in C/C++). In general, the compiler is allowed to assume
you never try to access the same value in two threads and so if you
do, all bets are off.&lt;/p&gt;
&lt;p&gt;If you&#39;ve written any multithreaded code, you know that the basic
defense against this kind of problem is what&#39;s called &amp;quot;locking&amp;quot;:
one thread &amp;quot;locks&amp;quot; a region of memory and as long as it&#39;s holding
the lock, no other thread can touch that region.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
There are a lot
of different kinds of lock, but one common one is what&#39;s called a
&amp;quot;read-write lock&amp;quot;. A read-write lock has the following semantics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you hold a read lock on a particular region, you&#39;re allowed to
read the memory but not write it. An arbitrary number of threads can
hold read locks on a given region as long as no thread holds a write
lock.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you hold a write lock on a particular region, you can read or
write it.  No other thread&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
can hold any kind of lock on a region as long
as someone is holding a write lock.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This should sound very familiar because it&#39;s precisely the semantics
Rust uses for mutability, if you just substitute &amp;quot;immutable reference&amp;quot;
for &amp;quot;read lock&amp;quot; and &amp;quot;mutable reference&amp;quot; for &amp;quot;write lock&amp;quot;. This is
not an accident, but instead it&#39;s a sign of a deep connection
between memory safety and thread safety.&lt;/p&gt;
&lt;h3 id=&quot;moving-data-between-threads&quot;&gt;Moving Data Between Threads &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#moving-data-between-threads&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As we&#39;ve discussed from the very beginning, Rust is a single
ownership language; if a given object is owned by one thread
then obviously it cannot be modified by two threads at once.
It can, however, be moved between threads, by two basic
mechanisms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When you &lt;a href=&quot;https://doc.rust-lang.org/book/ch16-01-threads.html#creating-a-new-thread-with-spawn&quot;&gt;spawn&lt;/a&gt;
a thread, you provide a function for the
thread to run. This function can be a &lt;a href=&quot;https://doc.rust-lang.org/book/ch20-04-advanced-functions-and-closures.html&quot;&gt;closure&lt;/a&gt;, which is a fancy term for an anonymous function defined in line.
The closure can capture variables from the environment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Rust provides mechanisms to write from one thread to another, such
as
&lt;a href=&quot;https://doc.rust-lang.org/std/sync/mpsc/fn.channel.html&quot;&gt;channels&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&quot;spawning-a-thread&quot;&gt;Spawning a Thread &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#spawning-a-thread&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Let&#39;s start with spawning a new thread. The basic code looks like this:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; val &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token namespace&quot;&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;spawn&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;move&lt;/span&gt; &lt;span class=&quot;token closure-params&quot;&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; val&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;|| {}&lt;/code&gt; is the syntax for a closure and when used with the &lt;code&gt;move&lt;/code&gt;
keyword, which means that any variable used in the body of
the closure (&amp;quot;captured&amp;quot;) is moved into the closure. This means
that &lt;code&gt;val&lt;/code&gt; will be unavailable inside &lt;code&gt;main&lt;/code&gt; after this point,
but will be available inside the closure, which is why we can
pass it to &lt;code&gt;println!()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now look what happens if we do the same thing but moving a reference
to &lt;code&gt;val&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; val &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; val_ref &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;val&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token namespace&quot;&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;spawn&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;move&lt;/span&gt; &lt;span class=&quot;token closure-params&quot;&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; val_ref&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As expected, this doesn&#39;t compile at all.&lt;/p&gt;
&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;error[E0597]: `val` does not live long enough&lt;br /&gt;  --&gt; examples/ex11-ref.rs:5:19&lt;br /&gt;   |&lt;br /&gt;4  |       let val = 10;&lt;br /&gt;   |           --- binding `val` declared here&lt;br /&gt;5  |       let val_ref = &amp;val;&lt;br /&gt;   |                     ^^^^ borrowed value does not live long enough&lt;br /&gt;6  |&lt;br /&gt;7  | /     thread::spawn(move || {&lt;br /&gt;8  | |         println!(&quot;{:?}&quot;, val_ref);&lt;br /&gt;9  | |     });&lt;br /&gt;   | |______- argument requires that `val` is borrowed for `&#39;static`&lt;br /&gt;10 |   }&lt;br /&gt;   |   - `val` dropped here while still borrowed&lt;br /&gt;&lt;br /&gt;For more information about this error, try `rustc --explain E0597`.&lt;br /&gt;error: could not compile `photos` (example &quot;ex11-ref&quot;) due to 1 previous error&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The lifetime problem here is that &lt;code&gt;main()&lt;/code&gt; may exit or
at least start to exit while
the thread is still running, which means that &lt;code&gt;val&lt;/code&gt;
get dropped and &lt;code&gt;val_ref&lt;/code&gt; becomes invalid. There&#39;s no
way to statically verify that &lt;code&gt;main()&lt;/code&gt; will wait for
the other thread to complete, so there&#39;s no way to
have the thread reference a local variable of &lt;code&gt;main()&lt;/code&gt;.
As indicated by this error, the only kind of reference
you can pass to a thread is one that has the special
&lt;code&gt;&#39;static&lt;/code&gt; lifetime, which means it lasts the entire lifetime
of the program (typically it&#39;s a global variable).&lt;/p&gt;
&lt;p&gt;You can make &lt;code&gt;val&lt;/code&gt; static, as shown in the code below,
but safe Rust &lt;a href=&quot;https://doc.rust-lang.org/reference/items/static-items.html#mutable-statics&quot;&gt;forbids mutable static variables&lt;/a&gt;, so the result
is that the variable is read only in both the thread
and in &lt;code&gt;main()&lt;/code&gt;, which preserves the &amp;quot;arbitrary number
of immutable references&amp;quot; invariant.&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;VAL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; val_ref &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;VAL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token namespace&quot;&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;spawn&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;move&lt;/span&gt; &lt;span class=&quot;token closure-params&quot;&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; val_ref&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While the logic is similar to what we saw &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#structs&quot;&gt;above&lt;/a&gt;,
the enforcement mechanism for this is slightly different.
The reason for this is that &lt;code&gt;thread::spawn()&lt;/code&gt; is just a function
and so even though we&#39;re passing it a reference there&#39;s nowhere
for it to store it, so once &lt;code&gt;thread::spwan()&lt;/code&gt; returns, that
reference should have been dropped which would mean it was safe
to drop the object it pointed to. You know and I know that
what &lt;code&gt;thread::spawn()&lt;/code&gt; actually does is to create a new thread
that runs independently from the main thread, but how is the
compiler to know that?&lt;/p&gt;
&lt;p&gt;What is happening here is that &lt;code&gt;thread::spawn()&lt;/code&gt; is defined with
a specific set of trait bounds (traits which the arguments
and return values have to implement):&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;spawn&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;f&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;JoinHandle&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;where&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token class-name&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;FnOnce&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Send&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;static&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Send&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token lifetime-annotation symbol&quot;&gt;&#39;static&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Focus your attention on the generic parameter &lt;code&gt;F&lt;/code&gt;, which is
the type of the closure passed to &lt;code&gt;spawn()&lt;/code&gt;. This is defined
as having to implement the following traits:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;a href=&quot;https://doc.rust-lang.org/std/ops/trait.FnOnce.html&quot;&gt;&lt;code&gt;FnOnce() -&amp;gt; T&lt;/code&gt;&lt;/a&gt;:&lt;/dt&gt;
&lt;dd&gt;Be a function which can be safely called at least once and
has a return value of type &lt;code&gt;T&lt;/code&gt;. The &lt;em&gt;at least&lt;/em&gt; means that it
might not be safe to call the function twice (and hence
the compiler will prohibit it).
&lt;em&gt;[Clarified -- 2025-05-25]&lt;/em&gt;.&lt;/dd&gt;
&lt;dt&gt;&lt;a href=&quot;https://doc.rust-lang.org/std/marker/trait.Send.html&quot;&gt;&lt;code&gt;Send&lt;/code&gt;&lt;/a&gt;:&lt;/dt&gt;
&lt;dd&gt;Can be safely [transferred across thread boundaries.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;&#39;static&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;Lasts for the duration of the program.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;It&#39;s this last constraint which matters for our purposes, because it
requires that the closure and &lt;em&gt;anything it captures&lt;/em&gt; has the
lifetime &lt;code&gt;&#39;static&lt;/code&gt;. Because a reference to a local variable doesn&#39;t
have &lt;code&gt;&#39;static&lt;/code&gt; lifetime, it can&#39;t be passed to &lt;code&gt;thread::spawn()&lt;/code&gt;
so there&#39;s no way to use &lt;code&gt;thread::spawn()&lt;/code&gt; to create an unsafe
reference.&lt;/p&gt;
&lt;h4 id=&quot;channels&quot;&gt;Channels &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#channels&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The other main option is to use some sort of messaging
system to write data from one thread to another. For
instance, Rust has a built-in mechanism called channels,
which works like this:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;sender&quot;&gt;Sender &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#sender&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; value &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Foo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;channel_tx&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;send&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;receiver&quot;&gt;Receiver &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#receiver&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; value &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; channel_rx&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;recv&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unwrap&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;As with &lt;code&gt;thread::spawn()&lt;/code&gt;, we can&#39;t use channels to unsafely
send references from one thread to another, though the mechanisms Rust
uses to prevent this are somewhat more complicated, and
I&#39;m not going to go into them here.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;
With that said, here&#39;s an example of the obvious thing that you
might try to do and the compiler will reject:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;sync&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;mpsc&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;channel&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; val &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;tx&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; rx&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;channel&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; _ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; tx&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;send&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;val&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token namespace&quot;&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;spawn&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;move&lt;/span&gt; &lt;span class=&quot;token closure-params&quot;&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; _ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; rx&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;recv&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unwrap&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;cross-thread-sharing&quot;&gt;Cross-Thread Sharing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#cross-thread-sharing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Obviously, there are situations when you &lt;em&gt;do&lt;/em&gt; want to share data
across threads, and just as Rust provides a mechanism (&lt;code&gt;RefCell&lt;/code&gt;)
for controlled mutation of data shared through immutable references,
it similarly has a set of mechanisms for controlled sharing
of writable data. A full description of how to do this is outside
of the scope of this already long post, but here is a trivial example.&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;sync&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Arc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Mutex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; val &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Arc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Mutex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; val_to_share &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Arc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;val&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; t &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;thread&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;spawn&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;move&lt;/span&gt; &lt;span class=&quot;token closure-params&quot;&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token closure-punctuation punctuation&quot;&gt;|&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; tmp &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; val_to_share&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;lock&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unwrap&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Inside thread val={:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;tmp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;tmp &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; _ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Main thread val={:?}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; val&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;lock&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unwrap&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The basic idea here is that we first wrap our shared data in a
&lt;code&gt;Mutex&lt;/code&gt;, which allows one reference (either readable or writable)
at once at a time. You obtain the reference by calling &lt;code&gt;.lock()&lt;/code&gt;,
and if someone else has it your thread will wait until they
have unlocked it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt;
A reference to a &lt;code&gt;Mutex&lt;/code&gt; can&#39;t be shared across threads directly any more
than any other variable can, but we &lt;em&gt;can&lt;/em&gt; wrap it in a reference
counted structure, in this case &lt;code&gt;Arc&lt;/code&gt; (the thread safe version
of &lt;code&gt;Rc&lt;/code&gt;) and move that across threads. The logic here is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We have two copies of &lt;code&gt;Arc&lt;/code&gt;, one in each thread, but both
pointing to the same &lt;code&gt;Mutex&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Each thread uses &lt;code&gt;.lock()&lt;/code&gt; to access the data inside the
&lt;code&gt;Mutex&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This allows us to share the data across the threads but guarantees
that only one thread at a time can use it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fn15&quot; id=&quot;fnref15&quot;&gt;[15]&lt;/a&gt;&lt;/sup&gt; The call to &lt;code&gt;.join()&lt;/code&gt; in the main thread is just waiting for
the other thread to finish to guarantee we have a chance to print both values.&lt;/p&gt;
&lt;p&gt;There&#39;s a lot more to say about multithreaded programming in Rust,
but I&#39;m not trying to teach you how to write parallel programs
in Rust; instead I want to make two points here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Writing memory safe and thread safe programs depends on the
same basic concepts, namely ensuring single ownership,
guaranteed lifetimes, and
preventing simultaneous writing and reading, whether that
simultaneity is a result of concurrency or not.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The mechanisms that Rust uses to provide thread safety are
much the same basic mechanisms as those which are used to
provide memory safety and similarly are based on clear
contracts between components plus local analysis for safety.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As I said this isn&#39;t an accident, but rather a result of the
connection between memory safety and thread safety, which
logical errors related to unclear ownership, and the
results of trying to
touch the same data in inconsistent ways in multiple places.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-garbage-collection&quot;&gt;Next Up: Garbage Collection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-5/#next-up%3A-garbage-collection&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There&#39;s plenty more to say about Rust memory management and in
particular about how to write manageable code that conforms to
Rust&#39;s rules—as well as how to break those rules
using &lt;code&gt;unsafe&lt;/code&gt; when you have to—but hopefully this gives you
an overall sense of how things are put together. In the next
post, I want to talk about a completely different approach
to memory, namely automatic memory management with garbage
collection, as used in languages ranging from Lisp to Go to
JavaScript.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I was too lazy to dig into this, but I&#39;m pretty sure what&#39;s
happening under the hood is that &lt;code&gt;.&lt;/code&gt; is basically just
syntactic sugar for &amp;quot;call this function with &lt;code&gt;self&lt;/code&gt; as
the first argument&amp;quot;. This allows Rust to figure out what
the right version of &lt;code&gt;self&lt;/code&gt; to provide is. &lt;code&gt;Metric::size()&lt;/code&gt; addresses the
function directly, and so you have to provide &lt;code&gt;self&lt;/code&gt;
directly, and that means you need to also provide the right
version of &lt;code&gt;self&lt;/code&gt;.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This isn&#39;t immediately evident, but in this code, &lt;code&gt;y&lt;/code&gt; is of
type &lt;code&gt;&amp;amp;i32&lt;/code&gt; rather than &lt;code&gt;i32&lt;/code&gt;; &lt;code&gt;println!()&lt;/code&gt; just takes
care of the dereference for you. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Well, fortunately in that I added &lt;code&gt;#[derive(Clone)]&lt;/code&gt; when
I wrote it. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, note that we are doing &lt;code&gt;for photo in new_photos&lt;/code&gt;
rather than &lt;code&gt;for photo in &amp;amp;new_photos&lt;/code&gt; because we need
to consume the vector. If we used &lt;code&gt;&amp;amp;new_photos&lt;/code&gt; we would
iterate over references and not be able to move the
object behind the reference when calling &lt;code&gt;add_photo()&lt;/code&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There&#39;s actually an ongoing project called &lt;a href=&quot;https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/&quot;&gt;Polonius&lt;/a&gt; to replace
the borrow checker with one that properly handles some cases which are
safe but which it currently rejects. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Ignore the &lt;code&gt;#[allow(unused_mut)]&lt;/code&gt;. This just stops the
compiler from complaining about how &lt;code&gt;album&lt;/code&gt; is unnecessarily
&lt;code&gt;mut&lt;/code&gt;, and I wanted to keep the code the same as before. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
One of the bad habits that Rust seems to have picked up
from C++ is the convention of making generic parameters
single letters, so you end up with &lt;code&gt;T&lt;/code&gt;, &lt;code&gt;U&lt;/code&gt;, &lt;code&gt;V&lt;/code&gt;, etc.
Just when we&#39;d persuaded people not to name regular
variables like that, too.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
And I haven&#39;t even gotten to &lt;code&gt;&#39;_&lt;/code&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
We actually don&#39;t even have to tell Rust that &lt;code&gt;tmp&lt;/code&gt; is an
&lt;code&gt;i32&lt;/code&gt;, because that&#39;s the basic type for bare integers. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
On the other hand, if we change &lt;code&gt;Holder.t&lt;/code&gt; to be a
&lt;code&gt;RefCell&amp;lt;Option&amp;lt;T&amp;gt;&amp;gt;&lt;/code&gt; then we get a compile error
even if we make &lt;code&gt;.set_value()&lt;/code&gt; take &lt;code&gt;&amp;amp;T&lt;/code&gt;. You need
to dig pretty deep into Rust internals to understand
why, but the logic is clear: you could in principle
have used the &lt;code&gt;RefCell&lt;/code&gt; to store a reference. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I&#39;m simplifying
here, because there&#39;s also a lot of machinery you need to
make sure the region is in good shape when the lock is removed,
due to the kinds of optimizations I mentioned above. We won&#39;t
cover that here. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I&#39;m being precise here, because some
systems
let you take recursive locks and some don&#39;t. There are pluses
and minuses. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I spent quite a while working through this and I&#39;m
still not done. Potentially the subject for a future
post. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that we don&#39;t have to explicitly unlock because
we&#39;re using RAII, just as with RefCell. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn15&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Rust also has a
&lt;a href=&quot;https://doc.rust-lang.org/std/sync/struct.RwLock.html&quot;&gt;read-write
lock&lt;/a&gt; mechanism
that allows for multiple readers at once, but I just decided
to show &lt;code&gt;Mutex&lt;/code&gt; here. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-5/#fnref15&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding Memory Management, Part 4: Rust Ownership and Borrowing</title>
		<link href="https://educatedguesswork.org/posts/memory-management-4/"/>
		<updated>2025-03-31T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/memory-management-4/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/rust-cover.jpeg&quot; alt=&quot;Cover image&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;[Post updated 2025-04-20 to fix some minor errors flagged by Erik Taubeneck]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is the fourth post in my planned multipart series on memory
management. Part &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;I&lt;/a&gt; covers the basics of
memory allocation and how it works in C, and parts
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2&quot;&gt;II&lt;/a&gt; and &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3&quot;&gt;III&lt;/a&gt;
covered the basics of C++ memory management, including RAII and smart
pointers. These tools do a lot to simplify memory management but because
they were added on to the older manual management core of C you&#39;re
left with a system which is mostly safe if you &lt;a href=&quot;https://www.wired.com/2010/06/iphone-4-holding-it-wrong/&quot;&gt;hold it right&lt;/a&gt;
but which can quickly become unsafe if you&#39;re not careful.
Next I want to talk about a language which was designed
to be safe from the ground up and won&#39;t let you be unsafe:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
Rust.
I&#39;d originally planned to talk about garbage collected languages
next, but after all that time spent on how C++ works,
I decided it would work better to do Rust next and then
close with garbage collection.&lt;/p&gt;
&lt;h2 id=&quot;single-ownership&quot;&gt;Single Ownership &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#single-ownership&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Unlike C and C++—or any other language we&#39;ll be looking
at—the basic design concept of Rust is &lt;em&gt;single ownership&lt;/em&gt;. I.e.,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Any given object can only have a single owner&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We&#39;ve already seen how to implement this model in C++ using
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#it&#39;s-good-to-be-unique&quot;&gt;unique pointers&lt;/a&gt;,
but in Rust single ownership is just how everything works.
Just as we saw with C++ unique pointers, this makes life simple:
when the owning variable goes out of scope, the object
is destroyed.&lt;/p&gt;
&lt;p&gt;In C/C++, when you assign one variable to another, the default
is to do a &lt;em&gt;copy&lt;/em&gt;. By contrast, Rust &lt;em&gt;moves&lt;/em&gt; the variable. Consider
the following code:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;c&quot;&gt;C &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#c&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token macro property&quot;&gt;&lt;span class=&quot;token directive-hash&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;token directive keyword&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&amp;lt;stdio.h&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token macro property&quot;&gt;&lt;span class=&quot;token directive-hash&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;token directive keyword&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&amp;lt;stdlib.h&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token class-name&quot;&gt;uint8_t&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; argc&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;argv&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; h2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%u %u&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;rust&quot;&gt;Rust &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#rust&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{} {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is a simple piece of code: we first create a &lt;code&gt;Hat&lt;/code&gt; named
&lt;code&gt;h1&lt;/code&gt; of size 5. We then create a new hat &lt;code&gt;h2&lt;/code&gt; and assign &lt;code&gt;h1&lt;/code&gt;
to &lt;code&gt;h2&lt;/code&gt; and then print out &lt;code&gt;h1&lt;/code&gt; and &lt;code&gt;h2&lt;/code&gt;. With C this works
exactly like you would expect:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;5 5

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With Rust, however, the situation is totally different, and the
compiler gets mad at us:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;error[E0382]: borrow of moved value: `h1`
 --&amp;gt; assign-rs.rs:9:23
  |
6 |     let h1 = Hat { size: 5 };
  |         -- move occurs because `h1` has type `Hat`, which does not implement the `Copy` trait
7 |     let h2 = h1;
  |              -- value moved here
8 |
9 |     println!(&amp;quot;{} {}&amp;quot;, h1.size, h2.size);
  |                       ^^^^^^^ value borrowed here after move
  |
note: if `Hat` implemented `Clone`, you could clone the value
 --&amp;gt; assign-rs.rs:1:1
  |
1 | struct Hat {
  | ^^^^^^^^^^ consider implementing `Clone` for this type
...
7 |     let h2 = h1;
  |              -- you could clone this value
  = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to 1 previous error

For more information about this error, try `rustc --explain E0382`.
make: *** [assign-rs.out] Error 1

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This rather long error message is trying to be helpful, but it can be
a bit confusing unless you&#39;re familiar with Rust. For instance, what
does it mean for something to be a &amp;quot;borrow of moved value&amp;quot;?  By the
end of this post, you should be able to understand pretty much
everything in this message.&lt;/p&gt;
&lt;h3 id=&quot;moving-variables&quot;&gt;Moving Variables &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#moving-variables&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I said above, in Rust assigning variable &lt;code&gt;A&lt;/code&gt; to variable &lt;code&gt;B&lt;/code&gt;
moves the object from &lt;code&gt;A&lt;/code&gt; to &lt;code&gt;B&lt;/code&gt;. In C++ this just means that it
executes the move assignment operator and leaves the source
object in a &amp;quot;valid but unspecified&amp;quot; state, but in Rust it
means something different and much stronger: it makes &lt;code&gt;B&lt;/code&gt;
the new reference for the object and then renders &lt;code&gt;A&lt;/code&gt; totally
invalid. Unlike in C++, this invariant is enforced by the
compiler, so when we later try to use &lt;code&gt;h1&lt;/code&gt; in the &lt;code&gt;println!()&lt;/code&gt;
statement, the compiler refuses and throws an error. This would
happen with any use of &lt;code&gt;h1&lt;/code&gt; after it was assigned to &lt;code&gt;h2&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;All of this works because—unlike in C++—move
semantics were baked into Rust from the start, and the compiler
is easily able to enforce them (as well as produce a less confusing
error message than you might get with some C++ template).
Similarly, Rust doesn&#39;t need you to implement a move
assignment operator because in Rust all moves are
implemented the same way—at least conceptually—as a &lt;a href=&quot;https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone&quot;&gt;bitwise copy&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; of
the object.
I say &amp;quot;at least conceptually&amp;quot; because in this case you don&#39;t need
to do a copy at all: the compiler can internally note that &lt;code&gt;h1&lt;/code&gt;
is now defunct and it is now named &lt;code&gt;h2&lt;/code&gt; and move forward without
doing any copying. Some playing around with &lt;a href=&quot;https://godbolt.org/&quot;&gt;Compiler Explorer&lt;/a&gt;
reveals that that&#39;s what &lt;code&gt;rustc&lt;/code&gt; does when optimization is on.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;copying-integers&quot;&gt;Copying Integers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#copying-integers&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Now consider some very similar Rust code, but using a bare integer in
place of the struct containing just one integer:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{} {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h2&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code compiles and runs perfectly well, exactly like our
original C code.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;5 5

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Why does this code work and the other code not? The answer is instead
of &lt;em&gt;moving&lt;/em&gt; &lt;code&gt;h1&lt;/code&gt;, the compiler has copied it. But that just requires
us to ask why it copied it when I already said that Rust did moves on
assignment? The answer to that question is that Rust knows that
integers are simple objects which can be safely copied. As a practical
matter, this mostly means that they don&#39;t contain pointers to
anything, so that you don&#39;t have to worry about having two pointers to
the same object (thus violating the single ownership rule).  In this
case, when you assign&lt;code&gt;A&lt;/code&gt; to &lt;code&gt;B&lt;/code&gt; it automatically makes a copy rather than
invalidating &lt;code&gt;B&lt;/code&gt;. This saves you the trouble of asking Rust to copy
the variable rather than moving it.&lt;/p&gt;
&lt;p&gt;Note that we&#39;re talking here about language semantics here, not the
output binary. The net effect here is that the compiler knows that
&lt;code&gt;h1&lt;/code&gt; and &lt;code&gt;h2&lt;/code&gt; can be used simultaneously, but it&#39;s free to make a copy
or just note that &lt;code&gt;h1&lt;/code&gt; and &lt;code&gt;h2&lt;/code&gt; have the same contents and use them
interchangeably in the &lt;code&gt;println!()&lt;/code&gt; statement.&lt;/p&gt;
&lt;h3 id=&quot;copying-structs&quot;&gt;Copying Structs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#copying-structs&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Of course, there&#39;s no real difference between an integer variable
and a struct with only a single integer in it, so it&#39;s actually
just safe to copy &lt;code&gt;Hat&lt;/code&gt; as it is to copy &lt;code&gt;u8&lt;/code&gt;, and we can tell
Rust that by decorating the &lt;code&gt;Hat&lt;/code&gt; definition like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token attribute attr-name&quot;&gt;#[derive(Copy, Clone)]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With this slight change, our original Rust program will work, just
like the C version or the bare integer Rust version. To understand
why this works, we need to take a detour into the Rust type
system.&lt;/p&gt;
&lt;h2 id=&quot;traits&quot;&gt;Traits &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#traits&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Although not a fully object-oriented language like C++, Rust includes
some object-oriented features and in particular a feature called
&lt;a href=&quot;https://doc.rust-lang.org/book/ch10-02-traits.html&quot;&gt;traits&lt;/a&gt;.
Like a class, the idea behind a trait is to define a set of
behaviors (in some other languages this is called an &amp;quot;interface&amp;quot;)
that types can implement. You may recall our
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#objects-and-classes&quot;&gt;shapes example&lt;/a&gt; from
part II where we had a class called &lt;code&gt;Shape&lt;/code&gt; and then derived
classes for &lt;code&gt;Rectangle&lt;/code&gt; and &lt;code&gt;Circle&lt;/code&gt;. Here&#39;s the C++ code and
the corresponding Rust code:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;c%2B%2B&quot;&gt;C++ &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#c%2B%2B&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;virtual&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token base-clause&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;virtual&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; width &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;rust-2&quot;&gt;Rust &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#rust-2&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;trait&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Shape&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    width&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    height&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;These two snippets have the same basic structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Shape&lt;/code&gt; defines the interface and says that all &lt;code&gt;Shape&lt;/code&gt; objects
implement an &lt;code&gt;area()&lt;/code&gt; method.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Rectangle&lt;/code&gt; is a concrete type that implements &lt;code&gt;Shape&lt;/code&gt; and provides
its own definition for &lt;code&gt;area()&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Just like with C++, we can now write code that expects a &lt;code&gt;Shape&lt;/code&gt;
and use any struct that implements &lt;code&gt;Shape&lt;/code&gt;. For instance,
we can write:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;print_area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;shape&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Area is {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; shape&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; rect &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        width&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        height&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;print_area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rect&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are a number of important differences between class
inheritance and trait implementation that aren&#39;t apparent
here. For example, Rust traits can&#39;t have any data whereas
C++ classes do; it just so happens that &lt;code&gt;Shape&lt;/code&gt; doesn&#39;t
have any data, but if we wanted to add (say) a &lt;code&gt;name&lt;/code&gt; field
to &lt;code&gt;Shape&lt;/code&gt; we could do that in C++ and all classes that
inherited from &lt;code&gt;Shape&lt;/code&gt; would inherit that field; you can&#39;t
do that in Rust. However, for the moment we can ignore
these differences.&lt;/p&gt;
&lt;h3 id=&quot;marker-traits&quot;&gt;Marker Traits &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#marker-traits&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Our &lt;code&gt;Shape&lt;/code&gt; trait just defines a single method, &lt;code&gt;area()&lt;/code&gt; but
there&#39;s actually nothing that requires us to define &lt;em&gt;any
methods at all&lt;/em&gt;; you can just have an empty trait like
so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;trait&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Circular&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Circular&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Circle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It may not be immediately obvious why this would be useful, but here&#39;s
an example. Unlike other shapes, you can compute the circumference of
a circle from the area.  So, we can write a function like this:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;print_circumference&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;shape&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Circular&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; radius &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;shape&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;f64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;consts&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;PI&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; circumference &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; radius &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.0&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;f64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;consts&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;PI&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Circumference is {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; circumference&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;impl Shape + Circular&lt;/code&gt; type in the function signature says that
the shape argument has to not only be a &lt;code&gt;Shape&lt;/code&gt; but also implement
&lt;code&gt;Circular&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Note that we don&#39;t &lt;em&gt;use&lt;/em&gt; any functions from &lt;code&gt;Circular&lt;/code&gt;
(there aren&#39;t any!), we just use it to restrict which shapes can be
provided to &lt;code&gt;print_circumference()&lt;/code&gt;. Consider the following function signature:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;print_circumference&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;shape&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;impl&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This would compile and run, but would let you try to compute
circumference from rectangles, even though that will give
the right answer. The trait restriction for &lt;code&gt;Circular&lt;/code&gt; (technical
term: &amp;quot;trait bound&amp;quot;) ensures that only circular objects can
be used with &lt;code&gt;print_circumference()&lt;/code&gt;. This particular example
may feel a little contrived because we could just require
&lt;code&gt;print_circumference()&lt;/code&gt; to take a circle, but this design
also lets us handle cylinders, which are also circular;
all we have to do is implement &lt;code&gt;Circular&lt;/code&gt; for &lt;code&gt;Cylinder&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Circular&lt;/code&gt; is what&#39;s called a &amp;quot;marker trait&amp;quot;; it doesn&#39;t have
any functionality of its own, it&#39;s just used to indicate that
a type has a specific property.&lt;/p&gt;
&lt;h3 id=&quot;the-copy-trait&quot;&gt;The Copy Trait &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#the-copy-trait&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At this point, it should be obvious where this is going: Rust has a
marker trait called &lt;code&gt;Copy&lt;/code&gt;, which tells the compiler that an object is
safe to copy. You can implement the &lt;code&gt;Copy&lt;/code&gt; trait on a struct using
the Rust &lt;code&gt;derive&lt;/code&gt; macro, like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token attribute attr-name&quot;&gt;#[derive(Copy, Clone)]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code tells the Rust compiler to implement &lt;em&gt;both&lt;/em&gt; the &lt;code&gt;Copy&lt;/code&gt;
and &lt;code&gt;Clone&lt;/code&gt; traits on &lt;code&gt;Hat&lt;/code&gt;. We&#39;ll get to the &lt;code&gt;Clone&lt;/code&gt; trait
in a little bit, but the &lt;code&gt;Copy&lt;/code&gt; &lt;em&gt;[Fixed: 2025-04-20]&lt;/em&gt; part is just syntactic sugar
for &lt;code&gt;impl Copy for {}&lt;/code&gt; (Rust has a lot of this kind of syntactic
sugar). Again, &lt;code&gt;Copy&lt;/code&gt; doesn&#39;t have any methods, it just
tells the compiler that it&#39;s OK to make a shallow copy.&lt;/p&gt;
&lt;p&gt;Of course, not all structs are safe to copy. For instance,
if you have a struct that contains a pointer to some data on the
heap, then copying it would violate the single owner rule—indicating which data &lt;em&gt;is&lt;/em&gt; safe to copy is
why we need to have the &lt;code&gt;Copy&lt;/code&gt; trait in the first place—so
what happens if we try to apply the trait to a non-copyable
object, as below:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Inner&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token attribute attr-name&quot;&gt;#[derive(Copy, Clone)]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Outer&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    inner&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Inner&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;rustc&lt;/code&gt; will refuse to compile this, producing the following error:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;error[E0204]: the trait `Copy` cannot be implemented for this type
 --&amp;gt; uncopyable.rs:3:10
  |
3 | #[derive(Copy, Clone)]
  |          ^^^^
4 | struct Outer {
5 |     inner: Inner,
  |     ------------ this field does not implement `Copy`
  |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The problem here—from the compiler&#39;s perspective—is that
we asked Rust to implement &lt;code&gt;Copy&lt;/code&gt; on &lt;code&gt;Outer&lt;/code&gt;, but outer includes
&lt;code&gt;Inner&lt;/code&gt;, which &lt;em&gt;doesn&#39;t&lt;/em&gt; implement &lt;code&gt;Copy&lt;/code&gt;; and since copying &lt;code&gt;Outer&lt;/code&gt;
requires copying &lt;code&gt;Inner&lt;/code&gt; and &lt;code&gt;Inner&lt;/code&gt; doesn&#39;t implement &lt;code&gt;Copy&lt;/code&gt;, then
you can&#39;t copy &lt;code&gt;Outer&lt;/code&gt; either. Because Rust knows which structs are
safe to copy and will refuse to let you implement the &lt;code&gt;Copy&lt;/code&gt; trait on
them, it&#39;s not possible to incorrectly label an unsafe to copy
object with &lt;code&gt;Copy&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The second thing to notice is that &lt;code&gt;Inner&lt;/code&gt; is actually perfectly safe
to copy, seeing as it&#39;s empty. We&#39;ve already seen that Rust
knows when it&#39;s safe to implement &lt;code&gt;Copy&lt;/code&gt;, so you might ask why
it doesn&#39;t just automatically let you &lt;code&gt;Copy&lt;/code&gt; whenever it&#39;s
safe to do so (recall that the trait is empty, so it would
be trivial to do so automatically). This is actually a common feature of Rust:
there are any number of situations where you try to
do something that requires trait &lt;code&gt;X&lt;/code&gt; and Rust knows it&#39;s safe
to implement, but forces you to explicitly derive traits
even though it could do so automatically. As a friend said
to me, writing Rust means resigning yourself to writing a lot of boilerplate;
fortunately, the compiler will mostly tell you what boilerplate
you need to add, and if you have a good IDE it will
probably have affordances to let you automatically
add it.&lt;/p&gt;
&lt;h2 id=&quot;clone&quot;&gt;Clone &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#clone&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Above, we told the compiler to derive &lt;em&gt;both&lt;/em&gt; &lt;code&gt;Copy&lt;/code&gt; and &lt;code&gt;Clone&lt;/code&gt;. Above we
went through &lt;code&gt;Copy&lt;/code&gt;, which is approximately a shallow copy. By contrast,
&lt;code&gt;Clone&lt;/code&gt; allows for copying objects which can&#39;t be safely shallow
copied. Here&#39;s the
definition of the &lt;code&gt;Clone&lt;/code&gt; trait:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;trait&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Sized&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Required method&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;Self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note how this is signature is reminiscent of C++&#39;s copy
assignment operator:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;c%2B%2B-2&quot;&gt;C++ &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#c%2B%2B-2&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  T&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; T&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;rust-3&quot;&gt;Rust &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#rust-3&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;Self&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Unlike C++, where assignment causes the copy assignment operator to be
invoked (and where you have to explicitly invoke move semantics, in
Rust you need to clone an object explicitly, in the obvious way using
the &lt;code&gt;.clone()&lt;/code&gt; method, as in:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Like C++, Rust will provide a default implementation (in
this case, if you do &lt;code&gt;#[derive(Clone)]&lt;/code&gt;. As you would expect,
the default implementation recursively clones all of the
members of the struct, so it will work as long as all of
those members also implement &lt;code&gt;Clone&lt;/code&gt; (which of course means
that all of their members need to implement &lt;code&gt;Clone&lt;/code&gt;, etc.).
However, again as with C++, when you implement &lt;code&gt;Clone&lt;/code&gt; you can supply any method
&lt;code&gt;clone()&lt;/code&gt; that you want as long as it returns an instance of
the object (that&#39;s what the &lt;code&gt;-&amp;gt; Self&lt;/code&gt; means).
For example, as we&#39;ll see later, this is how Rust implements
reference counted pointers, with &lt;code&gt;.clone()&lt;/code&gt; implementing
the reference count. By contrast, you can&#39;t override
the behavior of &lt;code&gt;Copy&lt;/code&gt; because it has no methods; it&#39;s
just a marker.&lt;/p&gt;
&lt;p&gt;As shown &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#the-copy-trait&quot;&gt;above&lt;/a&gt;, if you want to implement
&lt;code&gt;Copy&lt;/code&gt; Rust also requires you to implement &lt;code&gt;Clone&lt;/code&gt;. As far
as I can tell, this isn&#39;t logically necessary, but it&#39;s
obviously the case that if you can safely shallow copy
a struct, you can implement &lt;code&gt;clone()&lt;/code&gt; by just doing a
shallow copy, so it&#39;s somewhat silly to allow people
to implement &lt;code&gt;Copy&lt;/code&gt; but not &lt;code&gt;Clone&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&quot;heap-allocation-and-box&quot;&gt;Heap Allocation and &lt;code&gt;Box&lt;/code&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#heap-allocation-and-box&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So far we&#39;ve just looked at ordinary stack variables, but in Rust,
just like in C++, you frequently need to allocate memory on the
heap, but as we saw, giving the programmer to have direct
access to pointers results in all kinds of shenanigans.
Rust addresses this by requiring that all pointers be
boxed and forbidding you from unboxing them (again,
except in special code).&lt;/p&gt;
&lt;p&gt;The basic smart pointer in Rust is called
&lt;a href=&quot;https://doc.rust-lang.org/std/boxed/struct.Box.html&quot;&gt;Box&lt;/a&gt;, which is
the rough equivalent to C++ &lt;code&gt;unique_ptr&lt;/code&gt;. For example, the
following code allocates space containing the integer (&lt;code&gt;10&lt;/code&gt;)
and then prints it out:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;boxed&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Box&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; b &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Box&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;tmp = {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;b&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Unlike the way we&#39;ve used C++ smart pointers, where we first
called &lt;code&gt;new&lt;/code&gt; and then passed the result to the smart pointer
(&lt;code&gt;shared_ptr&amp;lt;Obj&amp;gt; s(new Obj())&lt;/code&gt;), &lt;code&gt;Box::new()&lt;/code&gt; does the
memory allocation itself, and what you pass it is actually
a created object, which it then moves into the box.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
You can
in fact break these up, as in the following code where we
make a &lt;code&gt;Hat&lt;/code&gt; named &lt;code&gt;h1&lt;/code&gt; and then move it into &lt;code&gt;hbox&lt;/code&gt;. As
usual, &lt;code&gt;h1&lt;/code&gt; will be unusable after we&#39;ve done that, so
we&#39;re still following the single ownership rule.&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;boxed&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Box&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; hbox &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Box&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; hbox&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Conventionally, you&#39;d do this in one operation, as in
&lt;code&gt;Box::new(Hat { size: 5 })&lt;/code&gt; but there&#39;s no real difference
between these two pieces of code.&lt;/p&gt;
&lt;p&gt;Like all the Rust pointer types, &lt;code&gt;Box&lt;/code&gt; is actually a generic, so it
can contain a pointer of any type. In this particular case, this is a
32 bit signed integer (&lt;code&gt;i32&lt;/code&gt;), so &lt;code&gt;b&lt;/code&gt; is actually of type &lt;code&gt;Box&amp;lt;i32&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Box&lt;/code&gt; behaves like any other Rust struct, so you can pass it
around, assign a &lt;code&gt;Box&lt;/code&gt; to other variables, etc. You can even
make a &lt;code&gt;Box&lt;/code&gt; of a &lt;code&gt;Box&lt;/code&gt; if you want to.&lt;/p&gt;
&lt;h2 id=&quot;mutability-and-immutability&quot;&gt;Mutability and Immutability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#mutability-and-immutability&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;By default, variables in rust are &lt;em&gt;immutable&lt;/em&gt;, which is to say
that once you have assigned their values. For instance, the
following code will not compile:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The error looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   |
9  |     let x = 10;
   |         - first assignment to `x`
10 |     x = 20;
   |     ^^^^^^ cannot assign twice to immutable variable
   |
help: consider making this binding mutable
   |
9  |     let mut x = 10;
   |         +++
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Characteristically, the compilation error tells us exactly what
we need to do, which is to make &lt;code&gt;x&lt;/code&gt; mutable using the &lt;code&gt;mut&lt;/code&gt; keyword:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is another case where Rust is the opposite of C/C++, in which
variables are mutable by default but can be labeled immutable
with the &lt;code&gt;const&lt;/code&gt; keyword:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;uint8_t&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can get along OK programming with just mutable variables—or,
for that matter, with just immutable variables&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;—but
it&#39;s a lot
more convenient to have both because it forces you to be intentional
about which variables will change and which will not.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
Generally good practice is to have as many variables be immutable as possible
and then make them mutable only when necessary. The Rust compiler
will stop you from modifying immutable variables and complain—though
not generate a hard error—if you make a variable mutable
unncessarily.&lt;/p&gt;
&lt;h2 id=&quot;references-and-borrowing&quot;&gt;References and Borrowing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#references-and-borrowing&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Consider the following trivial piece of Rust code:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;print_hat_size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;h&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;print_hat_size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;print_hat_size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This won&#39;t compile because we moved &lt;code&gt;h1&lt;/code&gt; into &lt;code&gt;print_hat_size()&lt;/code&gt;
the first time we called it and so we can&#39;t pass it into &lt;code&gt;print_hat_size()&lt;/code&gt;
again because it&#39;s now been invalidated. This is obviously really unhelpful:
we know that once &lt;code&gt;print_hat_size()&lt;/code&gt; has returned it&#39;s not doing anything
with the &lt;code&gt;Hat&lt;/code&gt;, so it&#39;s available to use again, but the compiler
won&#39;t let us.&lt;/p&gt;
&lt;p&gt;One option here would be to have &lt;code&gt;print_hat_size()&lt;/code&gt; pass &lt;code&gt;Hat&lt;/code&gt; back
by returning it and then we could call it again, like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;print_hat_size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;h&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    h&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;print_hat_size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;print_hat_size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will obviously work, but it&#39;s really clunky, and what if we wanted
&lt;code&gt;print_hat_size()&lt;/code&gt; to return something else? Then we&#39;d need to deal
with the actual return value. Instead, what we want to do is let
&lt;code&gt;print_hat_size()&lt;/code&gt; temporarily use its argument without actually
taking ownership. In Rust this is called &lt;em&gt;borrowing&lt;/em&gt; and the
resulting borrowed item is called a &lt;em&gt;reference&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;We&#39;ve already seen this kind of operation in C and C++ where we
we passed a pointer or a reference (C++) to an object to a function, like so:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;passing-a-pointer&quot;&gt;Passing a Pointer &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#passing-a-pointer&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Hat&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; hat&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h5 id=&quot;passing-a-reference&quot;&gt;Passing a Reference &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#passing-a-reference&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Hat&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; hat&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Borrowing in Rust is conceptually more like passing a reference in C++
in that in C++ references the callee has access to the object in the calling function
but it&#39;s not a pointer and so you can&#39;t do pointer-like things like
&lt;code&gt;free()&lt;/code&gt;; this shouldn&#39;t be surprising because Rust doesn&#39;t let us access raw pointers
at all. And just as with C++ references, you use &lt;code&gt;.&lt;/code&gt; notation to reference inner
values of the struct rather than &lt;code&gt;-&amp;gt;&lt;/code&gt; as you would with a pointer.&lt;/p&gt;
&lt;h3 id=&quot;mutable-and-immutable-references&quot;&gt;Mutable and Immutable References &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#mutable-and-immutable-references&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Just as Rust supports both mutable and immutable objects, it also
supports mutable and immutable references, which have the semantics
you would expect:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;addone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;input&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;input &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;print_value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;input&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Value {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;print_value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;addone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;print_value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which produces the following output when run:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Value 10
Value 11

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The basic way to take a reference to &lt;code&gt;i&lt;/code&gt; is to do &lt;code&gt;&amp;amp;i&lt;/code&gt;, which is what
we do with &lt;code&gt;print_value()&lt;/code&gt;, which does not need to modify its
input. By contrast, &lt;code&gt;addone()&lt;/code&gt; does need to modify its input, so it
needs to take a &lt;code&gt;&amp;amp;mut&lt;/code&gt; reference. Note that we need &lt;code&gt;mut&lt;/code&gt; in three
places:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Labeling &lt;code&gt;i&lt;/code&gt; as mutable&lt;/li&gt;
&lt;li&gt;In the signature for &lt;code&gt;addone()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;At the call site to &lt;code&gt;addone()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that &lt;code&gt;addone()&lt;/code&gt; is modifying &lt;code&gt;i&lt;/code&gt; in place, which is why when we call
&lt;code&gt;print_value()&lt;/code&gt; in &lt;code&gt;main()&lt;/code&gt; we see it modified. This is just the
same call by reference semantics we&#39;ve seen before.&lt;/p&gt;
&lt;p&gt;References are a critical tool but the way I&#39;ve just described them creates
an opportunity for people to really screw things up. Consider the
following C++ code (I&#39;m using C++ here for a reason which will
will be apparent soon):&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;#include &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;vector&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;br /&gt;#include &lt;span class=&quot;token string&quot;&gt;&quot;./print-array.h&quot;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;void &lt;span class=&quot;token function&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;vector&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;size_t&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;numbers&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size_t &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;sum&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  numbers&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push_back&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;auto i &lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; numbers&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    sum &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;int &lt;span class=&quot;token function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;int argc&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;argv&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;vector&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;size_t&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; numbers &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  size_t sum&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;numbers&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; sum&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Numbers = &quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; numbers &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot; Sum=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; sum &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Just to orient yourself, what this code does is to allocate
a vector of length 3 containing the values &lt;code&gt;1, 2, 3&lt;/code&gt; and
then passes it to a function &lt;code&gt;do_something()&lt;/code&gt; along with
a reference to an integer of type &lt;code&gt;usize&lt;/code&gt; that will
hold the sum of the values. &lt;code&gt;do_something()&lt;/code&gt; then does
the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Adds the value &lt;code&gt;5&lt;/code&gt; to the end of the vector.&lt;/li&gt;
&lt;li&gt;Sets the value of &lt;code&gt;sum&lt;/code&gt; to the sum of the values&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And then finally &lt;code&gt;main&lt;/code&gt; prints out the list of numbers
and the sum:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Numbers = [1, 2, 3, 5] Sum=11

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code is fine (though kind of pointless), but now let&#39;s
consider a very slight modification of this code where
instead of passing a reference to a separate local variable
in the &lt;code&gt;sum&lt;/code&gt; argument, we instead pass a reference to the
first element in the &lt;code&gt;numbers&lt;/code&gt; vector:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token macro property&quot;&gt;&lt;span class=&quot;token directive-hash&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;token directive keyword&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&amp;lt;vector&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token macro property&quot;&gt;&lt;span class=&quot;token directive-hash&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;token directive keyword&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;./print-array.h&quot;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;vector&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;size_t&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;numbers&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size_t &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;sum&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  numbers&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push_back&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;auto&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; numbers&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    sum &lt;span class=&quot;token operator&quot;&gt;+=&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; argc&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;argv&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;vector&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;size_t&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; numbers &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;numbers&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; numbers&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Changed code&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Numbers = &quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; numbers &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot; Sum=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; sum &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;[Fixed change marker: 2025-04-20]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Note that we just changed the line labeled &lt;code&gt;Changed code&lt;/code&gt;. Everything
else is the same. Naively, the expected outcome of this program would be the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Numbers=[11, 2, 3, 5], Sum=11
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is certainly one possible outcome, but it&#39;s not the only one,
and the others are bad. To see why, we need to take a closer look at
what&#39;s actually going on in memory. The figure below shows the situation
at the start of &lt;code&gt;do_something()&lt;/code&gt;:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/double-borrow-1.png&quot; alt=&quot;Memory layout at the start of &quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Memory layout at the start of `do_something()`
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;On the left, we see the two arguments to &lt;code&gt;do_something()&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;numbers&lt;/code&gt; which is a reference to the vector of numbers (itself
a local variable in &lt;code&gt;main&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sum&lt;/code&gt; which is a reference to the first number in the vector
(currently the value &lt;code&gt;1&lt;/code&gt;), though &lt;code&gt;do_something()&lt;/code&gt; doesn&#39;t
know that; it&#39;s just the same code as before.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Recall from &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#raii&quot;&gt;part II&lt;/a&gt;,
that a minimal container structure like a vector will look something
like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;typename&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; Vector &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   T&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; data_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;       &lt;span class=&quot;token comment&quot;&gt;// The elements of the vector&lt;/span&gt;&lt;br /&gt;   size_t len_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;    &lt;span class=&quot;token comment&quot;&gt;// The length of the vector&lt;/span&gt;&lt;br /&gt;   size_t size_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;   &lt;span class=&quot;token comment&quot;&gt;// The total size of the vector&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;data_&lt;/code&gt; contains the address of the memory region allocated for
the elements of the vector and &lt;code&gt;len_&lt;/code&gt; contains the number of
elements in the vector, and &lt;code&gt;size_&lt;/code&gt; contains the total size of
the &lt;code&gt;data_&lt;/code&gt; region.&lt;/p&gt;
&lt;p&gt;The reason we need separate &lt;code&gt;size_&lt;/code&gt; and &lt;code&gt;len_&lt;/code&gt; fields is to make it
cheap to grow and shrink the vector.  Typically with a container
structure like this, you wouldn&#39;t just allocate enough space for the
initial number of elements requested, but instead allocate more space
(maybe twice as much).  When you want to add another element, you just
add it to the end of the region and increment &lt;code&gt;len_&lt;/code&gt; &lt;em&gt;[Fixed: 2025-04-20]&lt;/em&gt; but you don&#39;t
need to allocate more memory until you&#39;ve exhausted the initial
allocation. Similarly, if someone wants to remove the last element
in the buffer, you can just decrement &lt;code&gt;len_&lt;/code&gt;, leaving one more slot
for a future insertion. The idea here is to avoid unnecessary
allocation and copying of the memory region.&lt;/p&gt;
&lt;p&gt;Of course, no matter how much you overallocate you&#39;ll eventually
reach the end of the pre-allocated buffer, at which point you&#39;ll
need to do a new allocation.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
That&#39;s the situation here because &lt;code&gt;len_&lt;/code&gt; and &lt;code&gt;size_&lt;/code&gt; are the
same, so the next insertion will require reallocating,
with the result shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/double-borrow-2.png&quot; alt=&quot;Memory layout after inserting &quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Memory layout after inserting `5`
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;As you can see, we&#39;ve allocated a new region to hold the
expanded vector and inserted the new element &lt;code&gt;5&lt;/code&gt; at the
end. This is all totally fine as far as &lt;code&gt;numbers&lt;/code&gt; is
concerned; the problem is with &lt;code&gt;sum&lt;/code&gt;, which is now pointing
to the old (&lt;code&gt;free()ed&lt;/code&gt;) region of memory which used to
contain the contents of the vector but could now contain
anything at all or even be in use for something (e.g., for
allocator bookkeeping). When we now go and attempt to write
into &lt;code&gt;*sum&lt;/code&gt;, this is a classic use-after-free issue and
can have pretty much any result (in C/C++ this would be undefined
behavior), and could easily lead to a crash or a vulnerability.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;I want to emphasize that this is would be totally
legal C++ code and the compiler will be perfectly happy to compile
it: the only thing that causes the problem is that we know
that &lt;code&gt;.push()&lt;/code&gt; might cause a reallocation, but you could
easily have a fixed-size container which refused to reallocate,
in which case this code would be safe; the fact that it&#39;s
not depends on information which the compiler doesn&#39;t have.
This code is not, however, legal Rust.&lt;/p&gt;
&lt;h3 id=&quot;the-rules-of-borrowing&quot;&gt;The Rules of Borrowing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#the-rules-of-borrowing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The way that Rust avoids the kind of issues we just saw is by
restricting borrowing. Specifically: any given object can
have &lt;a href=&quot;https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html#the-rules-of-references&quot;&gt;either&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One mutable reference&lt;/li&gt;
&lt;li&gt;An arbitrary number of immutable references&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The compiler enforces these rules for you via the &amp;quot;borrow checker&amp;quot;
and will throw an error if you try to violate them.&lt;/p&gt;
&lt;p&gt;The code above violates these rules because we are taking
two mutable references to &lt;code&gt;numbers&lt;/code&gt;. This might not be apparently
obvious because the second reference is actually to one of the
elements of &lt;code&gt;numbers&lt;/code&gt;, but the reference to &lt;code&gt;numbers&lt;/code&gt; also
transitively covers every element in &lt;code&gt;numbers&lt;/code&gt;, the
effect is the same.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Now let&#39;s try to write the corresponding Rust code,
which looks like this:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;numbers&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vec&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; sum&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    numbers&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; i &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; numbers &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;sum &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;sum &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; numbers &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token macro property&quot;&gt;vec!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; numbers&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; numbers&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Changed code&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Numbers={:?}, Sum={}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; numbers&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;numbers&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;[Updated 2025-03-31 with the right code. Thanks to Dave Cridland.]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When we try to compile this code, we get the following error:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;error[E0499]: cannot borrow `numbers` as mutable more than once at a time
 --&amp;gt; double-borrow-bad.rs:8:37
  |
8 |     do_something(&amp;amp;mut numbers, &amp;amp;mut numbers[0]);
  |     ------------ ------------       ^^^^^^^ second mutable borrow occurs here
  |     |            |
  |     |            first mutable borrow occurs here
  |     first borrow later used by call
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It should be obvious, but you can&#39;t fix this just by changing
these to immutable references: the compiler knows that &lt;code&gt;do_something()&lt;/code&gt;
takes mutable references and so will refuse to compile that code
too:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;error[E0308]: arguments to this function are incorrect
  --&amp;gt; double-borrow-bad.rs:11:5
   |
11 |     do_something(&amp;amp;numbers, &amp;amp;numbers[0]); // Changed code
   |     ^^^^^^^^^^^^           ----------- types differ in mutability
   |
note: types differ in mutability
  --&amp;gt; double-borrow-bad.rs:11:18
   |
11 |     do_something(&amp;amp;numbers, &amp;amp;numbers[0]); // Changed code
   |                  ^^^^^^^^
   = note: expected mutable reference `&amp;amp;mut Vec&amp;lt;usize&amp;gt;`
                      found reference `&amp;amp;Vec&amp;lt;{integer}&amp;gt;`
   = note: expected mutable reference `&amp;amp;mut usize`
                      found reference `&amp;amp;{integer}`
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Similarly, even if we were to change the signature of &lt;code&gt;do_something()&lt;/code&gt;
to take immutable references, then &lt;code&gt;do_something()&lt;/code&gt; wouldn&#39;t compile
because the compiler knows that &lt;code&gt;.push()&lt;/code&gt; &lt;em&gt;[Fixed: 2025-04-20]&lt;/em&gt; modifies the vector, and
assignment to &lt;code&gt;*sum&lt;/code&gt; changes the object being referred to (whatever it
is), so it won&#39;t compile &lt;code&gt;do_something()&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;global-safety-via-local-reasoning&quot;&gt;Global Safety via Local Reasoning &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#global-safety-via-local-reasoning&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I want to step back for a moment to pull out the larger point that
this example shows, which is that the way Rust delivers global safety
is by enforcing local properties. Recall from above that the original
C++ &lt;em&gt;[Fixed: 2025-04-20]&lt;/em&gt; code was unsafe as the result of two different properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When we called &lt;code&gt;do_something()&lt;/code&gt; we took a pointer to an inner
value of &lt;code&gt;numbers&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.push_back()&lt;/code&gt; potentially causes a reallocation, invalidating any
pointer to an inner value of &lt;code&gt;numbers&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These two properties are very distant in the code and so you
need global analysis to determine that it&#39;s unsafe. This analysis
is difficult and may not even be possible (in
fact the source code of &lt;code&gt;.push_back()&lt;/code&gt; may not be available to the
compiler at the time it is compiling &lt;code&gt;do_something()&lt;/code&gt;), so the
C++ compiler cannot detect the error at compile time, leaving
you with a run time problem. Rust&#39;s conservative borrowing
rules prevent this via enforcing the following properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;do_something()&lt;/code&gt; modified &lt;code&gt;numbers&lt;/code&gt; and &lt;code&gt;*sum&lt;/code&gt; and so these references
need to be &lt;code&gt;mut&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;do_something()&lt;/code&gt; takes &lt;code&gt;mut&lt;/code&gt; arguments and so when you call it in
&lt;code&gt;main()&lt;/code&gt; you need to take &lt;code&gt;mut&lt;/code&gt; references.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;amp;mut numbers&lt;/code&gt; and &lt;code&gt;&amp;amp;mut numbers[0]&lt;/code&gt; both borrow &lt;code&gt;numbers&lt;/code&gt; and so
the compiler forbids the double borrow.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The important thing to realize is that each of these properties
is enforced purely locally; when the compiler forbids the double
borrow in &lt;code&gt;do_something()&lt;/code&gt; it doesn&#39;t need to know anything
about the behavior of &lt;code&gt;do_something()&lt;/code&gt;, just that it needs to
take two &lt;code&gt;mut&lt;/code&gt; references. However, the result of applying the
rules locally is to provide global safety.&lt;/p&gt;
&lt;p&gt;Unfortunately, this safety doesn&#39;t come for free because these
rules are conservative and therefore forbid code which would
actually be safe but which the compiler cannot locally verify
is safe. For example, suppose that as hypothesized above
&lt;code&gt;.push()&lt;/code&gt; never allocated new memory but just allocated out
of a fixed-size buffer and returned an error when you tried to
exceed the size of that buffer. In that case, what we&#39;re
doing here would be fine—though odd—but Rust
still wouldn&#39;t allow it, as we&#39;d still need to take a &lt;code&gt;mut&lt;/code&gt;
reference to &lt;code&gt;&amp;amp;numbers[0]&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
Much of the experience of programming in Rust is about
trying to structure your code in such a way that the compiler
can determine that what you are trying to do is safe—or,
as is often the case, determining that the reason it can&#39;t
determine it&#39;s safe is that it actually isn&#39;t.&lt;/p&gt;
&lt;p&gt;In the rest of this post and the next, I want to talk about some of the
gymnastics you have to do when writing Rust code in order
to satisfy the borrow checker; this will also come up in the
next post.&lt;/p&gt;
&lt;h2 id=&quot;shared-ownership&quot;&gt;Shared Ownership &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#shared-ownership&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As mentioned in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3#other-smart-pointers&quot;&gt;part
III&lt;/a&gt; while single
ownership is easier to reason about there are situations where you
really need to have some kind of shared ownership. C++ provides the
&lt;code&gt;shared_ptr&lt;/code&gt; class for this, and Rust has a similar affordance called
&lt;a href=&quot;https://doc.rust-lang.org/std/rc/struct.Rc.html&quot;&gt;&lt;code&gt;Rc&lt;/code&gt;&lt;/a&gt; (for
&amp;quot;reference counted&amp;quot;), as well as a weak pointer type called
&lt;a href=&quot;https://doc.rust-lang.org/std/rc/struct.Weak.html&quot;&gt;&lt;code&gt;Weak&lt;/code&gt;&lt;/a&gt;.
These are implemented (mostly) the same as C++ smart pointers
but with some important differences.&lt;/p&gt;
&lt;p&gt;Just to orient you, here is the &lt;code&gt;Rc&lt;/code&gt;-ized version of the
code we started with:&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;code&quot;&gt;Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#code&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;rc&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Rc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Reference count 1&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{} {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;output&quot;&gt;Output &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#output&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;5 5

&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;As you can see this works perfectly fine.&lt;/p&gt;
&lt;p&gt;Note that unlike with C++, you can&#39;t just assign one &lt;code&gt;Rc&lt;/code&gt; to another
but instead you call &lt;code&gt;Rc::clone&lt;/code&gt;, which invoked the &amp;quot;associated
function&amp;quot; &lt;code&gt;clone&lt;/code&gt; of the struct &lt;code&gt;Rc&lt;/code&gt;, which increments the reference
count and returns a new copy of the &lt;code&gt;Rc&lt;/code&gt; object which can then be
assigned to &lt;code&gt;h2&lt;/code&gt;. If you were to instead just assign &lt;code&gt;h1&lt;/code&gt; to &lt;code&gt;h2&lt;/code&gt;
it would move it and you would get the same type of use-after-move
compilation error we got in our original code.&lt;/p&gt;
&lt;p&gt;The question you should immediately be asking here is why &lt;code&gt;Rc&lt;/code&gt; doesn&#39;t
violate Rust&#39;s single ownership rule, given that we now have both
&lt;code&gt;h1&lt;/code&gt; and &lt;code&gt;h2&lt;/code&gt; pointing to the same &lt;code&gt;Hat&lt;/code&gt; instance. The reason
is that &lt;code&gt;Rc&lt;/code&gt; &lt;em&gt;mediates&lt;/em&gt; access to the owned object in order
to ensure safety. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Ordinary access to the object through &lt;code&gt;Rc&lt;/code&gt; only allows immutable operations&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
so that if you try to modify it (e.g., &lt;code&gt;h2.size = 1&lt;/code&gt;) it will
fail.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It is possible to get a mutable reference to the owned object
via the &lt;code&gt;Rc::get_mut()&lt;/code&gt; function, but this will only succeed
if the reference count is equal to one. You also can&#39;t get
a reference of either type to the owned object once you called
&lt;code&gt;Rc::get_mut()&lt;/code&gt; because
the compiler detects that this would be a double
borrow (the rules that make this work are a bit advanced to go into
right now).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result of these two rules is that you can have as many
immutable references as you want but that you can&#39;t combine
a mutable reference with either another mutable reference or
another immutable reference, just like with the ordinary
borrowing rules; unlike with the borrowing rules, &lt;code&gt;Rc&lt;/code&gt;
enforces its rules at runtime, so attempting to call &lt;code&gt;Rc::get_mut()&lt;/code&gt;
at the wrong time will fail.&lt;/p&gt;
&lt;h2 id=&quot;interior-mutability&quot;&gt;Interior Mutability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#interior-mutability&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I said we weren&#39;t going to implement &lt;code&gt;Rc&lt;/code&gt; but ask yourself this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How does &lt;code&gt;Rc&lt;/code&gt; maintain the reference count?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Presumably there is some internal reference count value in &lt;code&gt;Rc&lt;/code&gt; just like there
is in C++ &lt;code&gt;shared_ptr&lt;/code&gt;, but that&#39;s only half the story because we
can call &lt;code&gt;Rc::clone()&lt;/code&gt; with an &lt;em&gt;immutable reference&lt;/em&gt; to the &lt;code&gt;Rc&lt;/code&gt; object,
like so:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;clone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;h1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;Rc::clone()&lt;/code&gt; obviously has to increment the reference count,
but the whole point of an immutable reference is that you can&#39;t change
the referenced object, so what&#39;s going on here?  The answer is that
Rust has a set of special smart pointers that allow you mutate an
object—under controlled conditions—even when you only have
an immutable reference:
&lt;a href=&quot;https://doc.rust-lang.org/std/cell/struct.Cell.html&quot;&gt;&lt;code&gt;Cell&lt;/code&gt;&lt;/a&gt; and
&lt;a href=&quot;https://doc.rust-lang.org/std/cell/struct.RefCell.html&quot;&gt;&lt;code&gt;RefCell&lt;/code&gt;&lt;/a&gt;. &lt;code&gt;Rc&lt;/code&gt;
actually uses &lt;code&gt;Cell&lt;/code&gt;, but in this post I&#39;ll be talking about
&lt;code&gt;RefCell&lt;/code&gt;, which is the one I have used more often.&lt;/p&gt;
&lt;p&gt;Like &lt;code&gt;Rc&lt;/code&gt;, we make a &lt;code&gt;RefCell&lt;/code&gt; with &lt;code&gt;RefCell::new(T)&lt;/code&gt; where &lt;code&gt;T&lt;/code&gt;
is an instance of the relevant type. Unlike &lt;code&gt;Rc&lt;/code&gt;, you can&#39;t use
the &lt;code&gt;RefCell&lt;/code&gt; directly but have to explicitly call &lt;code&gt;.borrow()&lt;/code&gt;
to get an immutable reference and &lt;code&gt;.borrow_mut()&lt;/code&gt; to get a
mutable reference, as shown in the code below.&lt;/p&gt;
&lt;div class=&quot;side-by-side-blocks&quot;&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;code-2&quot;&gt;Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#code-2&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;cell&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;RefCell&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;RefCell&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{} {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;borrow&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;borrow&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; borrowed &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;borrow_mut&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    borrowed&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; borrowed&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&quot;side-by-side-block&quot;&gt;
&lt;h4 id=&quot;output-2&quot;&gt;Output &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#output-2&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;5 5
10

&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RefCell&lt;/code&gt; enforces the usual rules about references, namely that
you can have as many immutable references outstanding as you want
as long as there aren&#39;t any mutable references, and if there
is a mutable reference then there can&#39;t be any other references.
If you try to violate these rules, Rust will call &lt;code&gt;panic!()&lt;/code&gt;,
which terminates the program (unless caught).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This tells us how to implement the reference count in &lt;code&gt;Rc&lt;/code&gt;: put the
reference count object in a &lt;code&gt;RefCell&lt;/code&gt; (as I said, it&#39;s actually a &lt;code&gt;Cell&lt;/code&gt;,
which has a different syntax but can be used to the same effect).
Then we can hold an immutable reference to &lt;code&gt;Rc&amp;lt;Hat&amp;gt;&lt;/code&gt; but
still call &lt;code&gt;Rc::clone()&lt;/code&gt;. It&#39;s &lt;code&gt;Rc&lt;/code&gt;&#39;s job to ensure that
it follows the borrowing rules—or risk a program crash—but
note that even if you mess up with &lt;code&gt;RefCell&lt;/code&gt; you still can&#39;t cause
a memory error or another unsafe condition; all that will happen is
that the program crashes.&lt;/p&gt;
&lt;p&gt;It&#39;s quite common to combine &lt;code&gt;Rc&lt;/code&gt; with &lt;code&gt;RefCell&lt;/code&gt; to get functionality
that is sort of like C++&#39;s &lt;code&gt;shared_ptr&lt;/code&gt;: once you have two &lt;code&gt;Rc&lt;/code&gt;s pointing
to the same object you can&#39;t use either one to mutate it, but there
are a lot of settings in which you want to have shared ownership of
an object but allow it to be mutated in one place or the other. In
C++ you can just do this, but in Rust you have to do something like
&lt;code&gt;Rc&amp;lt;RefCell&amp;lt;Hat&amp;gt;&amp;gt;&lt;/code&gt;, with &lt;code&gt;Rc&lt;/code&gt; providing the reference counted pointer
to the immutable &lt;code&gt;RefCell&lt;/code&gt; object which itself lets you mutate the
internal &lt;code&gt;Hat&lt;/code&gt; object.&lt;/p&gt;
&lt;h3 id=&quot;enforcing-the-reference-count&quot;&gt;Enforcing the Reference Count &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#enforcing-the-reference-count&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There&#39;s one more interesting point to make about &lt;code&gt;RefCell&lt;/code&gt;.
If the compiler isn&#39;t enforcing the rules about the number of mutable
and immutable references, and they&#39;re enforced by &lt;code&gt;RefCell&lt;/code&gt; at
runtime, how does that work? Obviously, it maintains a count of
the number of references it has given out, but how does it know
when they go away? This involves a little bit of cleverness
but you should be able to work it out for yourself given what
we&#39;ve seen so far. I&#39;ll wait.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Figure it out?&lt;/p&gt;
&lt;p&gt;Instead of returning &lt;code&gt;&amp;amp;T&lt;/code&gt; and &lt;code&gt;&amp;amp;mut T&lt;/code&gt;, &lt;code&gt;borrow()&lt;/code&gt;
and &lt;code&gt;borrow_mut()&lt;/code&gt; instead return some new smart pointers named
&lt;a href=&quot;https://doc.rust-lang.org/std/cell/struct.Ref.html&quot;&gt;&lt;code&gt;Ref&lt;/code&gt;&lt;/a&gt;
and &lt;a href=&quot;https://doc.rust-lang.org/std/cell/struct.RefMut.html&quot;&gt;&lt;code&gt;RefMut&lt;/code&gt;&lt;/a&gt;.
These smart pointers act (mostly) as if they were actually references to the
owned object, so you can (mostly) use them that way without
thinking too hard about it. They are attached to the underlying
&lt;code&gt;RefCell&lt;/code&gt; and when &lt;code&gt;Ref&lt;/code&gt; and &lt;code&gt;RefMut&lt;/code&gt; go out
of scope, their destructors fire (Rust calls this the &lt;a href=&quot;https://doc.rust-lang.org/std/ops/trait.Drop.html&quot;&gt;&lt;code&gt;Drop&lt;/code&gt;&lt;/a&gt; trait, and this decrements the &lt;code&gt;RefCell&lt;/code&gt;&#39;s count of
the number of outstanding references.&lt;/p&gt;
&lt;p&gt;We can see this in the following code snippet:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;token namespace&quot;&gt;std&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;cell&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;RefCell&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;RefCell&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{} {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;borrow&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;borrow&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;mut&lt;/span&gt; borrowed &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;borrow_mut&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        borrowed&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; borrowed&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// borrowed is dropped here.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{} {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;borrow&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;borrow&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the the first call to &lt;code&gt;println!()&lt;/code&gt; we borrow &lt;code&gt;h1&lt;/code&gt; twice immutably
(which is safe) but these borrows go out of scope after &lt;code&gt;println!()&lt;/code&gt;
returns. We then call &lt;code&gt;borrow_mut()&lt;/code&gt; and assign it to &lt;code&gt;borrowed&lt;/code&gt;,
at which point no other borrows are permitted; if we were to try to
do another borrow right before the next &lt;code&gt;println!()&lt;/code&gt; the program
would panic. However, once the braced block &lt;code&gt;{...}&lt;/code&gt; is closed, then
&lt;code&gt;borrowed&lt;/code&gt; goes out of scope, it&#39;s drop callback fires, and so
there are no more borrows of any kind and the immutable borrows
in the final &lt;code&gt;println!()&lt;/code&gt; are permitted.&lt;/p&gt;
&lt;h2 id=&quot;decoding-our-original-error&quot;&gt;Decoding our Original Error &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#decoding-our-original-error&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We are now in a position to understand our original error, which I&#39;ve reproduced
for your convenience.&lt;/p&gt;
&lt;h4 id=&quot;code-3&quot;&gt;Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#code-3&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token type-definition class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;token function-definition function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hat&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; h2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token macro property&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;{} {}&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; h2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4 id=&quot;output-3&quot;&gt;Output &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#output-3&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;error[E0382]: borrow of moved value: `h1`
 --&amp;gt; assign-rs.rs:9:23
  |
6 |     let h1 = Hat { size: 5 };
  |         -- move occurs because `h1` has type `Hat`, which does not implement the `Copy` trait
7 |     let h2 = h1;
  |              -- value moved here
8 |
9 |     println!(&amp;quot;{} {}&amp;quot;, h1.size, h2.size);
  |                       ^^^^^^^ value borrowed here after move
  |
note: if `Hat` implemented `Clone`, you could clone the value
 --&amp;gt; assign-rs.rs:1:1
  |
1 | struct Hat {
  | ^^^^^^^^^^ consider implementing `Clone` for this type
...
7 |     let h2 = h1;
  |              -- you could clone this value
  = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to 1 previous error

For more information about this error, try `rustc --explain E0382`.
make: *** [assign-rs.out] Error 1

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Most of this should now be straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Hat&lt;/code&gt; doesn&#39;t implement &lt;code&gt;Copy&lt;/code&gt; so when we assign &lt;code&gt;h2 = h1&lt;/code&gt; on line &lt;code&gt;7&lt;/code&gt; Rust does a move.&lt;/li&gt;
&lt;li&gt;We then try to use &lt;code&gt;h1&lt;/code&gt; on line &lt;code&gt;9&lt;/code&gt; which is illegal because it has been moved.&lt;/li&gt;
&lt;li&gt;The Rust compiler suggests that we should implement &lt;code&gt;Clone&lt;/code&gt;, which is reasonable, but really we should implement &lt;code&gt;Copy&lt;/code&gt; (which, recall, also requires &lt;code&gt;Clone&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One thing might be confusing, though, which is that the error is &amp;quot;borrow of moved value: &lt;code&gt;h1&lt;/code&gt;&amp;quot;
even though there&#39;s no explicit borrow here: we&#39;re passing &lt;code&gt;h1.size&lt;/code&gt; not &lt;code&gt;&amp;amp;h1.size&lt;/code&gt; to
&lt;code&gt;println!()&lt;/code&gt;. The clue here is the &lt;code&gt;!&lt;/code&gt;, which denotes that &lt;code&gt;println!&lt;/code&gt; is a Rust
&lt;a href=&quot;https://doc.rust-lang.org/reference/macros.html&quot;&gt;macro&lt;/a&gt;; inside that
macro, &lt;code&gt;println!()&lt;/code&gt; is taking a reference to its arguments
to avoid consuming them, hence this is a borrow, not just a use.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-more-rust&quot;&gt;Next Up: More Rust &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#next-up%3A-more-rust&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As you may have gathered, one of the main experiences of learning Rust
is figuring out how to architect your code in a way that is consistent
with Rust&#39;s ownership and borrowing rules. A lot of that is building
a mental model of how Rust works, which is what this post is about,
but at the end of the day, there&#39;s also a fair amount of gymnastics
required. I&#39;ll be going into that more in the next post.&lt;/p&gt;
&lt;h2 id=&quot;appendix%3A-c%2B%2B-vs.-rust-smart-pointers&quot;&gt;Appendix: C++ vs. Rust smart pointers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-4/#appendix%3A-c%2B%2B-vs.-rust-smart-pointers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As a reference, here is a comparison table between C++ and Rust smart
pointers.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;C++&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Rust&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Single ownership&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;unique_ptr&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;Box&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Shared ownership (strong)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;shared_ptr&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;Rc&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Shared ownership (weak)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;weak_ptr&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;Weak&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Atomic shared pointers&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;atomic&amp;lt;ptr-type&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;arc::Arc&lt;/code&gt;/&lt;code&gt;arc::Weak&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Internal ref counting&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;boost::intrusive_ptr&lt;/code&gt;*&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;intrusive_collections&lt;/code&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Interior mutability&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;N/A&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;code&gt;Cell&lt;/code&gt;, &lt;code&gt;RefCell&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;Non-standard feature.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Unless you explicitly tell it that&#39;s what you want
in which case you better know what you&#39;re doing. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;i.e., just copying the memory &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Interestingly, when optimization is &lt;em&gt;off&lt;/em&gt; &lt;code&gt;rustc&lt;/code&gt; doesn&#39;t
copy &lt;code&gt;h1&lt;/code&gt; to &lt;code&gt;h2&lt;/code&gt; but rather just initializes &lt;code&gt;h1&lt;/code&gt; and &lt;code&gt;h2&lt;/code&gt;
with the same value. However, if you change &lt;code&gt;h1&lt;/code&gt; before
doing the assignment (after making &lt;code&gt;h1&lt;/code&gt; mut, of course),
then the assignment copies the fields as you would expect). &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note for Rust nerds: this is actually a generic
function and &lt;code&gt;impl Shape + Circular&lt;/code&gt; stuff is syntactic sugar for
&lt;code&gt;fn print_circumference&amp;lt;T: Shape + Circular&amp;gt;(shape: T)&lt;/code&gt;.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though cool people use &lt;code&gt;make_unique()&lt;/code&gt; and friends.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
All mutable is a pretty common design pattern in
languages ranging from C and C++ to Python
and JavaScript. A number of functional languages
(e.g., Erlang) are all immutable, which
requires a somewhat different set of programming
idioms, typically involving explicitly maintaining
state by passing around the result of computations.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though this is undercut a little bit by Rust allowing
you to redefine (shadow) variable names. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Recall that &lt;code&gt;malloc()&lt;/code&gt; will also over-allocate, so this
reallocation might actually return the same region. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
NGL, this contrived example is probably not
going to do anything bad, but that kind of thinking
is a big part of why memory errors in C/C++ are so dangerous,
because they often don&#39;t cause problems during testing
but can then be exploited in bigger systems. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I wrote the example this way rather than double borrowing
&lt;code&gt;numbers&lt;/code&gt; directly because it&#39;s a bit hard to create
a dangerous example that way without using some kind
of concurrency (parallelism) mechanism. In multithreaded
programs, concurrent access to exactly the same data
structure routinely causes problems. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
We&#39;d also need a &lt;code&gt;mut&lt;/code&gt; reference to &lt;code&gt;numbers&lt;/code&gt; because
we want to modify the elements of the array and the
&lt;code&gt;len_&lt;/code&gt; field. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Specifically it implements &lt;code&gt;Deref&lt;/code&gt; and not &lt;code&gt;DerefMut&lt;/code&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are also &lt;code&gt;try_borrow()&lt;/code&gt; and &lt;code&gt;try_borrow_mut()&lt;/code&gt; which
will fail if you try to break the rules. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-4/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding Memory Management, Part 3: C++ Smart Pointers</title>
		<link href="https://educatedguesswork.org/posts/memory-management-3/"/>
		<updated>2025-03-10T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/memory-management-3/</id>
		<content type="html">&lt;p&gt;This is the third post in my planned multipart
series on memory management. In part &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;I&lt;/a&gt;
we covered the basics of memory allocation and how it works in
C, and in part &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2&quot;&gt;II&lt;/a&gt; we covered the
basics of C++ memory management, including the RAII idiom for
memory management. In this post, we&#39;ll be looking at a
powerful technique called &amp;quot;smart pointers&amp;quot; that lets you
use RAII-style idioms but for pointers rather than objects.&lt;/p&gt;
&lt;h2 id=&quot;why-do-you-want-pointers%3F&quot;&gt;Why do you want pointers? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#why-do-you-want-pointers%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Recall from before that I said that if you want to use RAII you need to
store an actual object on the stack, not a pointer to the object, because
if the object is on the heap, it won&#39;t be cleaned up when the function
exits. This is an annoying limitation.&lt;/p&gt;
&lt;p&gt;For example, suppose we want to write a function which &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Hash_function&amp;amp;oldid=1265545210&quot;&gt;hashes&lt;/a&gt; the
contents of the file, returning a single string containing the hash.
With just one hash function, that might look like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string &lt;span class=&quot;token function&quot;&gt;hash_file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string filename&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; buf&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream &lt;span class=&quot;token function&quot;&gt;fs&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;filename&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;in&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  SHA1 sha1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Hash object&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;is_open&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;buf&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;buf&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    sha1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;gcount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Hash a chunk of data.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; sha1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;final&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our hash function is just an object with two methods:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;update()&lt;/code&gt; which hashes a chunk of data.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;final()&lt;/code&gt; which returns the hash value.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I.e.,&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;SHA1&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; buf&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size_t len&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string &lt;span class=&quot;token keyword&quot;&gt;final&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This works fine, but what happens if we want to support more than one
hash function, for instance SHA-1, SHA-256, etc. One way to do this is
to have the caller of the function provide the name in a string, i.e.,&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string hash_value &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;hash_file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;input.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;sha-1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Internally, we&#39;d have what&#39;s called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Factory_(object-oriented_programming)&amp;amp;oldid=1249490663&quot;&gt;factory function&lt;/a&gt;that makes
a hashing object given the string name. I.e.,&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Hasher&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token keyword&quot;&gt;virtual&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; buf&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size_t len&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token keyword&quot;&gt;virtual&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string &lt;span class=&quot;token keyword&quot;&gt;final&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Hasher &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_hasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;All of the concrete hashers (e.g., &lt;code&gt;HashSHA&lt;/code&gt;) inherit from
&lt;code&gt;Hasher&lt;/code&gt; and &lt;code&gt;get_hasher()&lt;/code&gt; returns an instance of the desired hash
object. Because of inheritance, we can just assign all of them to
&lt;code&gt;Hasher *&lt;/code&gt;. This gives us a function like the following:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string &lt;span class=&quot;token function&quot;&gt;hash_file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string filename&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; buf&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream &lt;span class=&quot;token function&quot;&gt;fs&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;filename&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;in&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  Hasher &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;h &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;get_hasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;is_open&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;buf&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;buf&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;     h&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;gcount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Hash a chunk of data.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; h&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;final&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So far so good, but now instead of having an instance of
&lt;code&gt;SHA1&lt;/code&gt; on the stack we have an instance of &lt;code&gt;Hasher&lt;/code&gt; on the
heap and it leaks when the function returns. So much for
RAII.
Fortunately, it&#39;s possible to recover RAII semantics
in a generic way using a &amp;quot;smart pointer&amp;quot;.&lt;/p&gt;
&lt;h2 id=&quot;what&#39;s-a-smart-pointer-and-why-is-it-smart%3F&quot;&gt;What&#39;s a smart pointer and why is it smart? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#what&#39;s-a-smart-pointer-and-why-is-it-smart%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At a high level, a smart pointer is an object that can be used as if
it were a pointer but has better semantics. For instance, C++
&lt;a href=&quot;https://en.cppreference.com/w/cpp/memory/unique_ptr&quot;&gt;&lt;code&gt;unique_ptr&lt;/code&gt;&lt;/a&gt;
provides stack-like RAII properties for objects on the heap. It
gets used like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  unique_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Hasher&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_hasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is literally the only change we have to make. From there
on, we can use &lt;code&gt;h&lt;/code&gt;, which is actually a &lt;code&gt;unique_ptr&lt;/code&gt; holding
&lt;code&gt;Hasher *&lt;/code&gt;,as if it were a &lt;code&gt;Hasher *&lt;/code&gt;. When &lt;code&gt;h&lt;/code&gt; goes out of
scope out the end of the function, the pointer it&#39;s holding
will be freed, just as if the hasher object itself were on
the stack.&lt;/p&gt;
&lt;p&gt;C++ has a number of different smart pointer types built into
it, but first I want to look at how you actually implement
a smart pointer.&lt;/p&gt;
&lt;h3 id=&quot;implementing-a-smart-pointer&quot;&gt;Implementing a Smart Pointer &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#implementing-a-smart-pointer&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Suppose, for example, we want to implement a &lt;code&gt;unique_ptr&lt;/code&gt;-like
for &lt;code&gt;Hasher&lt;/code&gt;. For starters, we need a class that holds a
&lt;code&gt;Hasher*&lt;/code&gt; and destroys it on destruction:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;UniquePtrHasher&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;private&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  Hasher &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt; &lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;UniquePtrHasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Hasher &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; ptr&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;UniquePtrHasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is actually enough to give us RAII behavior: we can create
a &lt;code&gt;UniquePtrHasher&lt;/code&gt; in the usual way and when it goes out of
scope it will be destroyed along with the hasher object inside:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;HasherUniquePtr &lt;span class=&quot;token function&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_hasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This isn&#39;t really a smart pointer, though, it&#39;s just a container for
the object. If we want to do anything with object inside, we somehow
need to get at the pointer. The obvious thing is to just provide
a function called &lt;code&gt;get()&lt;/code&gt; that gives you the pointer:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Hasher &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is called &amp;quot;unboxing&amp;quot; because we have the pointer in a box (the
smart pointer) but now
we take it out to use it. Unboxing will work, but now we have to change all the
code which previously used &lt;code&gt;h&lt;/code&gt; as if it were a pointer to use
&lt;code&gt;h.get()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;buf&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;buf&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;     h&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;gcount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Hash a chunk of data.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; h&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;final&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Yuck! Not only is this a pain in the ass, but it undercuts
the whole thing we are trying to do here, which is to
avoid having to work with the raw pointer.&lt;/p&gt;
&lt;p&gt;Fortunately, C++ has a feature that makes this unnecessary because
we can use operator overloading. In &lt;a href=&quot;https://educatedguesswork.org/post/memory-management-2#operator-overloading&quot;&gt;part II&lt;/a&gt;,
we overloaded the copy assignment operator but here we are going
to overload the &lt;code&gt;-&amp;gt;&lt;/code&gt; operator instead so that
&lt;code&gt;UniquePtrHasher&lt;/code&gt; acts like &lt;code&gt;Hasher*&lt;/code&gt;.
The code
for that looks like this:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  T &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You don&#39;t need to worry too much about the syntax, but the net
effect is that when you use &lt;code&gt;-&amp;gt;&lt;/code&gt; with &lt;code&gt;UniquePtrHasher&lt;/code&gt; it acts
like you were using &lt;code&gt;-&amp;gt;&lt;/code&gt; with the internal pointer,
which is what we want. Here&#39;s our new code:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string &lt;span class=&quot;token function&quot;&gt;hash_file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string filename&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; buf&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream &lt;span class=&quot;token function&quot;&gt;fs&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;filename&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;in&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  UniquePtrHasher &lt;span class=&quot;token function&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_hasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// NEW&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;is_open&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;buf&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;buf&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;     h&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;gcount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Hash a chunk of data.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; h&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;final&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The only line that&#39;s changed from our original code
is the one marked &lt;code&gt;NEW&lt;/code&gt; where we
create the &lt;code&gt;UniquePtrHasher&lt;/code&gt; but now we&#39;ve eliminated the memory
leak.&lt;/p&gt;
&lt;p&gt;This is a smart pointer after the fact, but it&#39;s kind of a dumb
one. There are at least two big problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It doesn&#39;t guarantee uniqueness.&lt;/li&gt;
&lt;li&gt;It&#39;s not generic.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&#39;s solve these in turn.&lt;/p&gt;
&lt;h2 id=&quot;it&#39;s-good-to-be-unique&quot;&gt;It&#39;s good to be unique &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#it&#39;s-good-to-be-unique&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Consider the following code:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  UniquePtrHasher &lt;span class=&quot;token function&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_hasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  UniquePtrHasher h2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we saw in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1#copy-constructors&quot;&gt;part I&lt;/a&gt;, this
will invoke the copy constructor. Because we haven&#39;t defined a copy constructor,
we end up with the default one which makes a shallow copy, with the
result that &lt;code&gt;h&lt;/code&gt; and &lt;code&gt;h2&lt;/code&gt; both point to the same instance of &lt;code&gt;Hasher&lt;/code&gt;.
When the function ends they will &lt;em&gt;both&lt;/em&gt; try to &lt;code&gt;delete&lt;/code&gt; it, which
leads to a double free, with the following error:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uniqueptrhasher(20294,0x2000dcf80) malloc: *** error for object 0x600003694040: pointer being freed was not allocated
uniqueptrhasher(20294,0x2000dcf80) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As expected, the destructor for &lt;code&gt;UniquePtrHasher&lt;/code&gt; fires twice, but with the same
pointer (&lt;code&gt;0x600003694040&lt;/code&gt;). In this case, the implementation has
chosen to generate an error and crash the program for the double
&lt;code&gt;free()&lt;/code&gt; (note that the error is in &lt;code&gt;malloc()&lt;/code&gt; because this C++
implementation of &lt;code&gt;new/delete&lt;/code&gt; is based on &lt;code&gt;malloc()&lt;/code&gt;), but that&#39;s
just whoever wrote it doing you a favor. As noted in post I, anything
could happen at this point, but whatever it is is likely to be bad.
This is definitely a defect and quite
possibly a vulnerability.&lt;/p&gt;
&lt;p&gt;What we want to do is make this case impossible, which is to say to
make &lt;code&gt;UniquePtrHasher&lt;/code&gt; actually unique. By this point it should be
clear how to do this: we&#39;re going to overload the copy constructor and
the copy assignment operator. With what we know now, the obvious
thing to do is to just make them abort, like so:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;UniquePtrHasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;UniquePtrHasher &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This works in some sense, but it&#39;s a runtime error, which means that
our program will crash if we try to copy a &lt;code&gt;UniquePtrHasher&lt;/code&gt; but that
the compiler won&#39;t catch it so the program will still compile.
Moreover, there&#39;s no guarantee that the failure will be this
safe; it could easily be a serious vulnerability via use after
free.&lt;/p&gt;
&lt;p&gt;What we really want is a compile time guarantee. The old way
to do this was to make the copy constructor &lt;code&gt;private&lt;/code&gt; so that
it wasn&#39;t possible to call it, but the new way
(as of C++-11) is to mark it with &lt;code&gt;delete&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;UniquePtrHasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;UniquePtrHasher &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we then try to construct a new &lt;code&gt;UniquePtrHasher&lt;/code&gt; from an existing
one, we get the somewhat helpful error:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uniqueptrhasher.cpp:23:19: error: call to deleted constructor of &#39;UniquePtrHasher&#39;
   23 |   UniquePtrHasher u2(u);
      |                   ^  ~
uniqueptrhasher.cpp:18:3: note: &#39;UniquePtrHasher&#39; has been explicitly marked deleted here
   18 |   UniquePtrHasher(UniquePtrHasher &amp;amp;other) = delete;
      |   ^
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we do the same thing for the copy assignment operator, we&#39;ve
then prevented anyone from making a second &lt;code&gt;UniquePtrHasher&lt;/code&gt; pointing
to the same underling object. Well, sort of.&lt;/p&gt;
&lt;p&gt;It&#39;s true that if we do all of this you can&#39;t make &lt;em&gt;another&lt;/em&gt; &lt;code&gt;UniquePtrHasher&lt;/code&gt; from
an existing one, but nothing stops you from making two &lt;code&gt;UniquePtrHasher&lt;/code&gt;s from the
same pointer:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Hasher&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; hasher &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;get_hasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hash_name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  UniquePtrHasher &lt;span class=&quot;token function&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hasher&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  UniquePtrHasher &lt;span class=&quot;token function&quot;&gt;hr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hasher&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The obvious answer is &amp;quot;don&#39;t do that then&amp;quot; but we&#39;re just depending on
programmer discipline, because the compiler won&#39;t stop you. How is it
to know you didn&#39;t want that outcome (and later we&#39;ll see an example of where
this kind of thing is totally legitimate, if slightly inadvisable)?&lt;/p&gt;
&lt;h2 id=&quot;moving-smart-pointers&quot;&gt;Moving Smart Pointers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#moving-smart-pointers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;OK, so now have a unique pointer, but this is pretty limited.
While we don&#39;t want to be able to make a copy
of a unique pointer (that&#39;s what makes it unique), sometimes
we want to &lt;em&gt;move&lt;/em&gt; an object from one unique pointer to another.
For example, suppose we have created an object but we want
to store it in a container, like a vector. This presents
two problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We want the vector to own it so our destructor doesn&#39;t
destroy it when it goes out of scope.&lt;/li&gt;
&lt;li&gt;The vector may need to move the object around when it
reallocates its own memory to grow or shrink.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The key thing here is that we want to preserve the uniqueness
guarantee. After we do &lt;code&gt;a = b&lt;/code&gt;, we want &lt;code&gt;a&lt;/code&gt; to be holding
the pointer and &lt;code&gt;b&lt;/code&gt; not to be.&lt;/p&gt;
&lt;p&gt;Unsurprisingly, we&#39;re going to do this with the move assignment
operator, which we saw in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2#moving-on&quot;&gt;part I&lt;/a&gt;. It looks something like this:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  UniquePtrHasher&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;UniquePtrHasher &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Check for self-assignment.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;nullptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;                             &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The move assignment operator does two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It sets the &lt;code&gt;ptr_&lt;/code&gt; field in the new smart pointer
(the one being moved to) to point to the object
that is being held.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It &lt;em&gt;invalidates&lt;/em&gt; the &lt;code&gt;ptr_&lt;/code&gt; field in the original
pointer (the one being moved away from) by setting
it to &lt;code&gt;nullptr&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Put together, these will prevent the object from
being destroyed when the source smart pointer is
destroyed. We also want to make sure that any use
of the old smart pointer fails cleanly, so we should
add a check in the &lt;code&gt;operator-&amp;gt;()&lt;/code&gt; implementation:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Hasher &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&#39;ll also need to implement the move constructor,
which is basically the same as the move assignment
operator.&lt;/p&gt;
&lt;h2 id=&quot;a-generic-smart-pointer&quot;&gt;A generic smart pointer &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#a-generic-smart-pointer&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is all fine, but note that we haven&#39;t written a &lt;em&gt;generic&lt;/em&gt; unique
pointer class, but instead one that only works for &lt;code&gt;Hasher&lt;/code&gt;.
If we want one for a new class called &lt;code&gt;Smasher&lt;/code&gt; we need to
write it all again, or rather we need to take the &lt;code&gt;Hasher&lt;/code&gt;
class and globally replace &lt;code&gt;Hasher&lt;/code&gt; with &lt;code&gt;Smasher&lt;/code&gt; (and
better hope you don&#39;t have anything called &lt;code&gt;NotHasher&lt;/code&gt; because
it will become &lt;code&gt;NotSmasher&lt;/code&gt;).
Fortunately, C++ offers a better way of doing this: &lt;a href=&quot;https://en.cppreference.com/w/cpp/language/templates&quot;&gt;templates&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;templates&quot;&gt;Templates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#templates&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The idea with templates is to let the compiler do that search and replace for you,
which obviously works a lot better than text replacement.
We do this by making a version of the class with a placeholder typename
(conventionally &lt;code&gt;T&lt;/code&gt;) instead of the actual concrete typename, like so:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;template&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;UniquePtr&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  T&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;UniquePtr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;T &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;t&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;UniquePtr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  T &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;UniquePtr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;UniquePtr &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;u&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;nullptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  UniquePtr&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;UniquePtr &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;u&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Check for self-assignment.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;nullptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Delete the copy constructor and copy assignment operator.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;UniquePtr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;UniquePtr &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;u&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  UniquePtr&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; UniquePtr&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that we&#39;re going to call this &lt;code&gt;UniquePtr&lt;/code&gt; rather than &lt;code&gt;unique_ptr&lt;/code&gt;
to avoid confusing it with C++&#39;s built-in implementation.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;developing-a-template-class&quot;&gt;Developing a Template Class &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#developing-a-template-class&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s true that one of the advantages of templates is that
they avoid all the grotty search and replace, but it&#39;s
actually fairly common to develop a concrete instance
of the template for one class and then do search and
replace of &lt;code&gt;Hasher&lt;/code&gt; (or whatever) to &lt;code&gt;T&lt;/code&gt;. It&#39;s typically
easier to implement classes in the specific case and
then generalize, especially with C++&#39;s &lt;a href=&quot;https://github.com/pranavkantgaur/STLfilt&quot;&gt;famously terrible template-related error messages.&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The syntax &lt;code&gt;template&amp;lt;class T&amp;gt;&lt;/code&gt; tells C++ that this is a template and that the
name of the placeholder type is &lt;code&gt;T&lt;/code&gt; (as an aside, you can have
multiple placeholder types). When you actually go to use the template
class, you tell it what type you want to make a unique
pointer for and the compiler replaces the &lt;code&gt;T&lt;/code&gt;s with that
typename, producing a new version of the &lt;code&gt;UniquePtr&lt;/code&gt; class that is customized
just for that type (technical term: &lt;em&gt;instantiating&lt;/em&gt; the template).&lt;/p&gt;
&lt;p&gt;The syntax should be familiar by now:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  UniquePtr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Hasher&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get_hasher&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;sha-1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Importantly, &lt;code&gt;UniquePtr&amp;lt;Hasher&amp;gt;&lt;/code&gt; and &lt;code&gt;UniquePtr&amp;lt;Smasher&amp;gt;&lt;/code&gt; are totally
different classes, as you can see if you try to stuff a &lt;code&gt;Smasher *&lt;/code&gt;
into &lt;code&gt;UniquePtr&amp;lt;Hasher&amp;gt;&lt;/code&gt;, provoking the following super-helpful
compiler error message:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./smart.cpp:44:21: error: no matching constructor for initialization of &#39;UniquePtr&amp;lt;Hasher&amp;gt;&#39;
   44 |   UniquePtr&amp;lt;Hasher&amp;gt; h(new Smasher());
      |                     ^ ~~~~~~~~~~~~~
./smart.cpp:6:3: note: candidate constructor not viable: no known conversion from &#39;Smasher *&#39; to &#39;UniquePtr&amp;lt;Hasher&amp;gt; &amp;amp;&#39; for 1st argument
    6 |   UniquePtr(UniquePtr &amp;amp;t) = delete;
      |   ^         ~~~~~~~~~~~~
./smart.cpp:10:3: note: candidate constructor not viable: no known conversion from &#39;Smasher *&#39; to &#39;Hasher *&#39; for 1st argument
   10 |   UniquePtr(T *t) {
      |   ^         ~~~~
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;thinking-outside-the-box&quot;&gt;Thinking Outside The Box &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#thinking-outside-the-box&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This isn&#39;t a complete implementation of a unique pointer, but it
illustrates the essential features.&lt;/p&gt;
&lt;p&gt;The good news is that if you only use smart pointers and not
regular pointers, then your programs will be a lot safer.
This isn&#39;t to say that there can&#39;t be any memory errors
because there are other ways to do unsafe stuff, but
to a first order you won&#39;t have to worry about memory
leaks or use after free. The problem here is that
nothing in C++ restricts you to just using smart pointers.&lt;/p&gt;
&lt;p&gt;For example, with unique pointers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;You can create a raw pointer and then add it to a
smart pointer, but this doesn&#39;t invalidate the
original raw pointer; you just end up with both
the copy in the smart pointer (the &amp;quot;boxed&amp;quot; version) and the original one in
the raw pointer (the &amp;quot;unboxed&amp;quot; version).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can get an unboxed copy of the pointer just
by doing &lt;code&gt;.get()&lt;/code&gt;. This doesn&#39;t invalidate the
boxed version the way that &lt;code&gt;std::move()&lt;/code&gt; does.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both of these violate the uniqueness guarantee of
&lt;code&gt;unique_ptr&amp;lt;T&amp;gt;&lt;/code&gt;, so if you do either of them, then
you&#39;re back in the situation where you have to worry
about manually managing memory and C++ won&#39;t protect
you.&lt;/p&gt;
&lt;p&gt;The natural thing to say is &amp;quot;I&#39;ll just work with boxed
pointers&amp;quot;, and to some extent you can do that (though
again, C++ won&#39;t stop you from unboxing stuff, you just
have to not do it) but it&#39;s very common to have to work
with code that doesn&#39;t know about smart pointers, and
then you end up unboxing them, at which point you&#39;re
back in the soup.&lt;/p&gt;
&lt;h2 id=&quot;other-smart-pointers&quot;&gt;Other Smart Pointers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#other-smart-pointers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One common situation where you end up having to sort of
unbox unique pointers is when you want to pass them to
a function, as in:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;doit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;unique_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; foo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// do something&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;unique_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;doit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As expected, this fails because passing &lt;code&gt;x&lt;/code&gt; by value would require
making a copy of &lt;code&gt;x&lt;/code&gt;, which would violate the uniqueness invariant. We
could move &lt;code&gt;x&lt;/code&gt; but then we couldn&#39;t use it later, when what we really
want is to just let &lt;code&gt;doit()&lt;/code&gt; do something with &lt;code&gt;x&lt;/code&gt; temporarily
but keep ownership. One way around this would just be to pass a
pointer to &lt;code&gt;x&lt;/code&gt;, like so:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;doit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;unique_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; foo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// do something&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;unique_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;doit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will work, but what it&#39;s really doing is working around
the uniqueness rule: we only have one object holding onto the
inner &lt;code&gt;Foo *&lt;/code&gt; but we&#39;ve got multiple variables pointing at the
&lt;code&gt;unique_ptr&amp;lt;Foo&amp;gt;&lt;/code&gt;, one in &lt;code&gt;f()&lt;/code&gt; and one passed as a pointer
to &lt;code&gt;doit()&lt;/code&gt;. So, technically we haven&#39;t unboxed the inner
pointer, but we&#39;ve unboxed &lt;code&gt;x&lt;/code&gt; which has all the same problems as before.
C++ also has a feature called &amp;quot;references&amp;quot;, where the
callee (in this case &lt;code&gt;doit()&lt;/code&gt;) can just ask for what&#39;s
effectively a pointer to the object without modifying the
call site, and you could ask for a reference to &lt;code&gt;x&lt;/code&gt; but
this still has the problem that you can copy the references
around, so it&#39;s possible to have a reference to &lt;code&gt;x&lt;/code&gt; outlive
&lt;code&gt;x&lt;/code&gt;. This isn&#39;t to say that it&#39;s not possible to safely
use references or pointers to a &lt;code&gt;unique_ptr&lt;/code&gt;, just that
you have to be careful to follow the rules because the
compiler won&#39;t help you out. (Aside: in production code
you should use references rather than pointers, but I&#39;m
trying to keep new syntax to a minimum).&lt;/p&gt;
&lt;p&gt;It&#39;s also quite common to have situations where you actually &lt;em&gt;want&lt;/em&gt; to
have two pointers to the same underlying object. For example, suppose
that we&#39;re building an HR application and we want to keep track of
people&#39;s managers. It&#39;s normal for two people to have two managers,
but we can&#39;t copy &lt;code&gt;unique_ptr&lt;/code&gt; so things get tricky. Fortunately,
&lt;code&gt;unique_ptr&lt;/code&gt; isn&#39;t the only type of smart pointer. For this task,
the tool we want is called &lt;code&gt;shared_ptr&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;shared-pointers&quot;&gt;Shared Pointers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#shared-pointers&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Unlike a &lt;code&gt;unique_ptr&lt;/code&gt;, multiple &lt;code&gt;shared_ptr&lt;/code&gt; instances can point to a
given object, just like with a regular pointer (or, if you&#39;re
used to a programming language like Python, JavaScript, or Go, just
like you happens all the time). &lt;code&gt;shared_ptr&lt;/code&gt; keeps track of how many instances there
are (the &amp;quot;reference count&amp;quot;). When you copy a &lt;code&gt;shared_ptr&lt;/code&gt; the
reference count increases by one. When a &lt;code&gt;shared_ptr&lt;/code&gt; instance
is destroyed the reference count decreases by one. If the reference
count reaches zero, that means that there are no &lt;code&gt;shared_ptr&lt;/code&gt;s to
the object and it&#39;s destroyed.&lt;/p&gt;
&lt;p&gt;For example, consider the following code:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Foo&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Destructor for Foo&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; argc&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;argv&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;p1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Reference count %ld&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; p1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; p2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; p1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Reference count %ld&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; p1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Reference count %ld&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; p2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;br /&gt;  p2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;reset&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Forget about the pointer.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Reference count %ld&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; p1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  p1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;reset&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When run, this produces the output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Reference count 1
Reference count 2
Reference count 2
Reference count 1
Destructor for Foo
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that when both &lt;code&gt;p1&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; point to the object, then they also
share the same reference count (2). When we reset &lt;code&gt;p2&lt;/code&gt;, telling it to
forget about the object, then the reference count drops to 1 and then
when we reset &lt;code&gt;p1&lt;/code&gt;, then the reference count drops to zero and the
object is destroyed.&lt;/p&gt;
&lt;p&gt;Once we have a shared pointer we can use it to pass an object around
without unboxing it:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;doit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; foo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// do something&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;p1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;doit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;p1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When we pass &lt;code&gt;p1&lt;/code&gt; to &lt;code&gt;doit()&lt;/code&gt;, it makes a copy of &lt;code&gt;p1&lt;/code&gt;, incrementing
the reference count. Then when &lt;code&gt;doit()&lt;/code&gt; returns, that copy is destroyed,
decrementing the reference count. The same thing happens when we want
to have multiple pointers to the same object, as in the case of
storing a pointer to an employee&#39;s manager.&lt;/p&gt;
&lt;p&gt;It&#39;s worth taking a quick look at how &lt;code&gt;shared_ptr&lt;/code&gt; works. The code
below shows the core of a homegrown implementation, focusing on
the new stuff.&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;template&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;SharedPtr&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;detail&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    T&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    size_t ct_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  detail &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;SharedPtr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;nullptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;SharedPtr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;T &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;t&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;detail&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    ptr_&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    ptr_&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;ct_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;SharedPtr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;SharedPtr &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;u&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; u&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    ptr_&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;ct_&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  SharedPtr&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; SharedPtr&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; u&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    ptr_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; u&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    ptr_&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;ct_&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;SharedPtr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    ptr_&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;ct_&lt;span class=&quot;token operator&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;ptr_&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;ct_&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt; ptr_&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt; ptr_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The key intuition is that we need somewhere to store the reference
count, and that needs to be shared between the different instances of
&lt;code&gt;SharedPtr&lt;/code&gt; so that they have the same view of the reference. This
means that it can&#39;t be stored in any one copy in case that copy goes
out of scope.  Instead, we allocate a new &lt;code&gt;detail&lt;/code&gt; object which stores
the reference count and every instance of &lt;code&gt;SharedPtr&lt;/code&gt; just points to
the single &lt;code&gt;detail&lt;/code&gt; instance, as shown here:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/shared-ptr-structure.png&quot; alt=&quot;Shared pointer structure&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Two shared pointers to the same object
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In this case, it also stores the pointer
to the owned object, though that could in principle also go in the
&lt;code&gt;SharedPtr&lt;/code&gt; class, with each one having its own copy of the pointer,
which is what &lt;a href=&quot;https://learn.microsoft.com/en-us/cpp/cpp/smart-pointers-modern-cpp?view=msvc-170#kinds-of-smart-pointers&quot;&gt;Windows seems to do&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;circular-references&quot;&gt;Circular References &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#circular-references&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Consider the following (potentially more complicated than necessary) code,
which models a trivial family with one parent and one child. We want
each parent to know its own child and and each child to know its own
parent so that we can go from parent to child and vice versa.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Parent&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;Parent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Parent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;~Parent&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Child&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; child_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Forward declaration.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;set_child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Child&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; child&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Child&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;Child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Parent&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; parent&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    parent_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; parent&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;~Child&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Parent&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; parent_&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;// Definition of set_child()&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Parent&lt;/span&gt;&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set_child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Child&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; child&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  child_ &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; child&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;make_family&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Parent&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;parent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Parent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Child&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;parent&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  parent&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set_child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;child&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When we run &lt;code&gt;make_family()&lt;/code&gt; code, we would expect to get the following
output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;~Parent
~Child
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, in practice we get nothing. The reason is simple: &lt;code&gt;parent&lt;/code&gt;
and &lt;code&gt;child&lt;/code&gt; aren&#39;t actually being destroyed. But why not?  To debug
this, let&#39;s add some instrumentation:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Parent&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;parent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Parent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Parent refct=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; parent&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Child&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;parent&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Parent refct=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; parent&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;; Child refct=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; child&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  parent&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set_child&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;child&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Parent refct=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; parent&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;; Child refct=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; child&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And here&#39;s the output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Parent refct=1
Parent refct=2; Child refct=1
Parent refct=2; Child refct=2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&#39;s a little tricky to print out the reference counts after &lt;code&gt;make_family()&lt;/code&gt; has completed,
because if we return &lt;code&gt;shared_ptr&amp;lt;Parent&amp;gt;&lt;/code&gt; then we&#39;ll have a reference to it in
the caller and don&#39;t expect it to be destroyed. What we want is to somehow
keep ahold of &lt;code&gt;parent&lt;/code&gt; without having a reference to it. We can do this
if we unbox &lt;code&gt;parent&lt;/code&gt; via &lt;code&gt;parent.get()&lt;/code&gt; and return it, allowing us to look
inside:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Parent&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; parent &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;make_family&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Child refct=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; parent&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;child_&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Parent refct=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; parent&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;child_&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;parent_&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;use_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  Parent* parent = make_family();
  std::cout &amp;lt;&amp;lt; &amp;quot;Child refct=&amp;quot; &amp;lt;&amp;lt; parent-&amp;gt;child_.use_count() &amp;lt;&amp;lt; std::endl;
  std::cout &amp;lt;&amp;lt; &amp;quot;Parent refct=&amp;quot; &amp;lt;&amp;lt; parent-&amp;gt;child_-&amp;gt;parent_.use_count() &amp;lt;&amp;lt; std::endl;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Child refct=1
Parent refct=1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You may have figured out what&#39;s going on here, but if not, it&#39;s helpful
to walk through things step by step and look at the structure, as shown
in the following diagram.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/shared-ptr-stages.png&quot; alt=&quot;Shared pointer step-by-step&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Shared pointer step by step.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;First, we allocate a new instance of &lt;code&gt;Parent&lt;/code&gt;, storing a pointer to
it in the shared pointer local variable &lt;code&gt;parent&lt;/code&gt;, with reference count=1.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We then allocate a new instance of &lt;code&gt;Child&lt;/code&gt;, passing it a shared pointer
to &lt;code&gt;parent&lt;/code&gt;. The constructor for &lt;code&gt;Child&lt;/code&gt; copies the pointer to
&lt;code&gt;parent&lt;/code&gt; to its own internal &lt;code&gt;parent_&lt;/code&gt; shared pointer, incrementing
the reference count to 2. We then assign the new &lt;code&gt;Child&lt;/code&gt; to the local
shared pointer &lt;code&gt;child&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We then tell our instance of &lt;code&gt;Parent&lt;/code&gt; about our new instance of &lt;code&gt;Child&lt;/code&gt;
with the &lt;code&gt;set_child()&lt;/code&gt; function, which copies the shared pointer into
its own internal &lt;code&gt;child_&lt;/code&gt; shared pointer, incrementing the reference
count to 2.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, when &lt;code&gt;make_family()&lt;/code&gt; returns, both local shared pointer instances
are destroyed, decrementing the corresponding reference counts, with
the result that each shared pointer now has reference count 1.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The reason that neither &lt;code&gt;Parent&lt;/code&gt; nor &lt;code&gt;Child&lt;/code&gt; is being destroyed is that
each is being owned by a shared pointer with reference count 1, held
by the other: &lt;code&gt;Parent&lt;/code&gt; holding a shared pointer to &lt;code&gt;Child&lt;/code&gt; and &lt;code&gt;Child&lt;/code&gt;
holding a shared pointer to parent. Nothing else is pointing to either,
except for the the unboxed pointer to the &lt;code&gt;Parent&lt;/code&gt; that we leaked for debugging
purposes, which you can ignore as it has no effect on the reference
count (you can easily reproduce this effect without returning that
as we did in the original program). Each object is keeping the other alive,
but they&#39;re not otherwise relevant.&lt;/p&gt;
&lt;p&gt;What we&#39;ve done here is reproduce the classic problem with reference
counting for memory management: &lt;em&gt;circular references&lt;/em&gt;. The basic assumption
behind a reference counting system like shared pointers is that it assumes
that the shared pointers are laid out in what programmers call a &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Directed_acyclic_graph&amp;amp;oldid=1261432631&quot;&gt;directed acyclic graph&lt;/a&gt; (DAG)&lt;/em&gt;,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
which is to say that
there are no loops where if you follow the shared pointers from object &lt;strong&gt;A&lt;/strong&gt; you
eventually get back to &lt;strong&gt;A&lt;/strong&gt;. If there are, then you can end up with objects
which can&#39;t be freed even if all the references external to the loop are
destroyed. In this case, the memory will be inaccessible but can&#39;t be freed.&lt;/p&gt;
&lt;h3 id=&quot;assuring-destruction&quot;&gt;Assuring Destruction &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#assuring-destruction&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;OK, so we have some data which can&#39;t be freed? Is that such a big
deal.  It&#39;s important to recognize that when you are using RAII, this
kind of error is not just a matter of wasted resource but a
correctness issue because object destruction can have visible &lt;em&gt;side
effects.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As a simple, consider the following trivial code:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;function &lt;span class=&quot;token function&quot;&gt;write_stuff&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;ofstream &lt;span class=&quot;token function&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;x.out&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  file &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Hello world&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This just writes &lt;code&gt;Hello world&lt;/code&gt; to the file &lt;code&gt;x.out&lt;/code&gt;. However, there&#39;s a
lot hiding under that &amp;quot;just&amp;quot;, because the program doesn&#39;t address the
disk hardware directly. Instead, it uses the &amp;quot;system call&amp;quot; &lt;code&gt;write()&lt;/code&gt;
to ask the operating system to write some stuff to the disk, which
sets off a long chain of other events which I won&#39;t go into here. Each
call to &lt;code&gt;write()&lt;/code&gt; is somewhat expensive, so programs typically will
buffer up individual writes internally and then &lt;em&gt;flush&lt;/em&gt; the buffer
when it gets full or, critically, when the file is closed. In this
code, that is hidden by the use of RAII, which just magically takes
care of things when the function returns, but what&#39;s really happening
is something like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;function &lt;span class=&quot;token function&quot;&gt;write_stuff&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;ofstream &lt;span class=&quot;token function&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;x.out&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  file &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Hello world&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// flush |file|&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// free file&#39;s memory&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If something interferes with &lt;code&gt;file&lt;/code&gt; being destroyed, then some data may
be left in the buffers, with the result that &lt;code&gt;x.out&lt;/code&gt; will be truncated.
We can reproduce this by calling &lt;code&gt;exit()&lt;/code&gt; at the end of &lt;code&gt;write_stuff()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;write_stuff&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;ofstream &lt;span class=&quot;token function&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;x.out&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  file &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Hello world&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;exit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As the name suggests, &lt;code&gt;exit()&lt;/code&gt; causes the program to exit, and it never
returns, which means that &lt;code&gt;write_stuff()&lt;/code&gt; never returns, which means
that &lt;code&gt;file&lt;/code&gt; is never destroyed (or, rather, that the destructor never
runs, because of course all the program memory is freed), which means
that anything still in the buffers is never written to disk. On my
computer (a Mac) the result of this program is a file containing only the
letter &lt;code&gt;H&lt;/code&gt;, so presumably &lt;code&gt;ello World&lt;/code&gt; is still in the buffer, lost
to us forever. The same thing would happen if instead we had some
bug that prevented the object from being destroyed, such as it was
held by some circular reference as above.&lt;/p&gt;
&lt;p&gt;Note that the exact behavior you observe will depend on the precise
buffering strategy employed by your C++ implementation, as the
standard doesn&#39;t appear to prescribe one behavior. In fact, the astute
observer will note that I didn&#39;t end the write with a &lt;code&gt;std::endl&lt;/code&gt;,
which adds a line feed to the end of the line; on my machine this
seems to cause the buffer to flush.&lt;/p&gt;
&lt;p&gt;The key point here is that many nontrivial uses of RAII depend on the
destructors actually executing at the right time, so defects like
circular references, while not precisely a memory leak in the technical
sense (each object is being pointed to by &lt;em&gt;something&lt;/em&gt;), can lead to
serious correctness issues.&lt;/p&gt;
&lt;h3 id=&quot;weak-pointers&quot;&gt;Weak Pointers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#weak-pointers&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s totally reasonable to want to make data structures which
have reference loops, so if we can&#39;t just use shared pointers,
what do we do? One option would be to have one of the pointers
just be unboxed, but this undercuts the whole purpose of using
smart pointers in the first place. What we instead need is a
different kind of smart pointer called a &amp;quot;weak pointer&amp;quot;.&lt;/p&gt;
&lt;p&gt;Unlike other types of smart pointer, a weak pointer doesn&#39;t keep the
pointed to object alive (in C++ jargon, it doesn&#39;t &amp;quot;own&amp;quot; it).  This
means that the object might be destroyed while you are holding the
weak pointer.  This means that you can have circular references as
long as the reference in one direction is a weak pointer, because that
breaks the cycle.&lt;/p&gt;
&lt;p&gt;Because the object might be destroyed out from under you, in order to
ensure that a weak pointer is safely used, then, you need to temporarily
convert the weak pointer into a shared pointer using the &lt;code&gt;lock()&lt;/code&gt; method.
This shared pointer keeps the object while you use it and when it
goes out of scope, you&#39;re still holding the weak pointer. For example:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Temp&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Temp&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;weak_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Temp&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;weak&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;temp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Temp&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; locked &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; weak&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;lock&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;locked&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    locked&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Object destroyed&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Importantly, if the underlying object has already been destroyed
(because the shared pointer reference count went to 0), the &lt;code&gt;lock()&lt;/code&gt;
method can fail, in which case the resulting shared pointer will have
the value &lt;code&gt;nullptr&lt;/code&gt; (pointing to nothing). This is an inherent
consequence of the fact that the weak pointer doesn&#39;t keep the object
alive.&lt;/p&gt;
&lt;p&gt;While I&#39;m not going to go into all the details of how to implement
weak pointers here (there are a number of techniques) I do want to
note that one way to implement it is for the shared pointer &lt;code&gt;detail&lt;/code&gt;
object to maintain two reference counts, one strong, one weak. When
the strong reference count goes to zero, you destroy the object being
held. When both the strong and weak pointer counts are zero, you
destroy the &lt;code&gt;detail&lt;/code&gt; object itself; this allows the weak pointer to
continue to exist and point to valid memory—though just to the
&lt;code&gt;detail&lt;/code&gt; object—even if all the shared pointers have been
destroyed. This doesn&#39;t violate the RAII correctness guarantees
described above because the object is still destroyed, but it does
mean that there is &lt;em&gt;some&lt;/em&gt; overhead from a weak pointer hanging
around even if the shared pointers are all gone.&lt;/p&gt;
&lt;h2 id=&quot;when-you-have-to-unbox&quot;&gt;When you have to unbox &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#when-you-have-to-unbox&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We now have &lt;code&gt;unique_ptr&lt;/code&gt;, &lt;code&gt;shared_ptr&lt;/code&gt;, and &lt;code&gt;weak_ptr&lt;/code&gt;, which means we&#39;re
all set, right? Well, maybe. If you&#39;re writing totally new code,
then these three smart pointers are basically all you need, but if
you have to deal with older code which doesn&#39;t use smart pointers,
then you can run into problems.&lt;/p&gt;
&lt;p&gt;Consider the following simple C-style API for timers.&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;set_timer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; timeout&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;         &lt;span class=&quot;token comment&quot;&gt;// How long to wait&lt;/span&gt;&lt;br /&gt;               &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;callback&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;     &lt;span class=&quot;token comment&quot;&gt;// The callback to call&lt;/span&gt;&lt;br /&gt;               &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;                      &lt;span class=&quot;token comment&quot;&gt;// An argument to pass&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a bit abstruse, due to a combination of C&#39;s limited semantics
and arcane syntax, but what it says is that you pass in three
arguments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A timeout&lt;/li&gt;
&lt;li&gt;A function to call when the timeout expires&lt;/li&gt;
&lt;li&gt;An argument to pass to the function&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a modern language, you would either pass a &lt;code&gt;Callback&lt;/code&gt; object that
encapsulated the callback and the context or, even better, a closure
that encapsulated all the relevant state, but neither of these is
available in C, so instead we have this.&lt;/p&gt;
&lt;p&gt;You use this API this way:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;print_string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Called with argument &#39;%s&#39;&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;set_timer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; callback&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Hello!&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When the timer expires, &lt;code&gt;print_string()&lt;/code&gt; gets called with a pointer
to the string provided as the third argument. Note that this can
actually be a pointer to any type of object (that&#39;s what &lt;code&gt;void *&lt;/code&gt;)
means, and it&#39;s the job of the callback to know what type of pointer
it actually is and use it appropriately. The &lt;code&gt;(char *)ptr&lt;/code&gt; means
&amp;quot;treat this as if it contains a string&amp;quot;, which better be true
or things can turn very ugly very fast.&lt;/p&gt;
&lt;p&gt;This is all fine, though a bit fiddly, but what happens if we want
to pass some dynamically allocated object to the callback? In
C, static strings are stored in the data segment so you don&#39;t need
to allocate or free them, but what if we had a dynamically
constructed string? In that case, we may need to free the
object &lt;em&gt;in the callback&lt;/em&gt;, like so:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;print_string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Called with argument &#39;%s&#39;&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&#39;re back to C-style memory management here, but what if we want
to work with an object which is owned by a smart pointer? We could
unbox the pointer, like so:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;set_timer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; do_something&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;foo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This works as long as &lt;code&gt;foo&lt;/code&gt; outlives the timer, but there are
situations where you don&#39;t know the respective lifetimes. For
instance, consider what happens if you are making some
kind of network request and want to set a timer in case
the request takes too long. In this case, it could be either
the request error handler or the timer that is the last use
of the object, which is exactly the kind of problem that shared pointers
are designed to help you manage! What you actually want to do is
to pass the shared pointer to the callback handler, but this
impoverished API precludes that.&lt;/p&gt;
&lt;h3 id=&quot;internal-reference-counting&quot;&gt;Internal Reference Counting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#internal-reference-counting&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One way to manage this situation is to move the reference count from
outside the object (as in shared pointer) to inside the object. For instance,
we can require that any managed object expose a reference counting
interface like so:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;ManagedObject&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;AddRef&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Release&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Internally, the object has to maintain a reference counter which is
incremented whenever &lt;code&gt;AddRef()&lt;/code&gt; is called. When &lt;code&gt;Release()&lt;/code&gt; is called,
the reference counter is decremented. If the reference count reaches
0, &lt;code&gt;Release()&lt;/code&gt; will destroy the object using &lt;code&gt;delete this&lt;/code&gt;, which is
safe to do as long as you are
&lt;a href=&quot;https://isocpp.org/wiki/faq/freestore-mgmt#delete-this&quot;&gt;super-careful&lt;/a&gt;.
The implementation of the smart pointer itself looks sort of like
&lt;code&gt;SharedPtr&lt;/code&gt;, except that it calls the &lt;code&gt;AddRef()&lt;/code&gt; and &lt;code&gt;Release()&lt;/code&gt; functions
rather than directly incrementing and decrementing its own reference
count.&lt;/p&gt;
&lt;p&gt;If we adapt our program to use a reference counted pointer it looks
like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;outer_function&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    Foo &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;f &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;                      &lt;span class=&quot;token comment&quot;&gt;// Reference count = 1&lt;/span&gt;&lt;br /&gt;    RefCountedPtr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;f&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;               &lt;span class=&quot;token comment&quot;&gt;// Reference count = 1&lt;/span&gt;&lt;br /&gt;    foo&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;AddRef&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;                           &lt;span class=&quot;token comment&quot;&gt;// Reference count = 2&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;set_timer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; do_something&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;foo&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// foo is out of scope. Reference count = 1&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Reference count = 1, because |outer_function()| exited.&lt;/span&gt;&lt;br /&gt;  RefCountedPtr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;Foo&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Foo &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  foo&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// object will be destroyed here.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that unlike shared pointers, this still requires some attention
to the reference count. In particular, before we unbox the pointer
and pass it to &lt;code&gt;set_timer()&lt;/code&gt; we need to manually increment the
reference counter. The reason for this is that the object is
essentially being owned by the timer infrastructure, and if we
didn&#39;t do that, then when &lt;code&gt;outer_function()&lt;/code&gt; returned, the object
would be destroyed as the smart pointer went out of scope.&lt;/p&gt;
&lt;p&gt;Perhaps less obviously, we have to manage what happens when the
&lt;code&gt;RefCountedPtr&lt;/code&gt; takes ownership of an object: does it increment
the reference count or not? You need an option to have it leave
the reference count alone so that when it takes ownership in the
&lt;code&gt;do_something()&lt;/code&gt; callback we don&#39;t end up with a reference count
of 2 rather than 1 (because it&#39;s being handed off from the timer
infrastructure to the callback). In this code I&#39;ve opted to only
have that variant and force objects to self-initialize with a
reference count of 1, but another alternative is to have a flag
of some kind that tells the &lt;code&gt;RefCountedPtr&lt;/code&gt; constructor
whether to increment or not.&lt;/p&gt;
&lt;h3 id=&quot;implementation-status&quot;&gt;Implementation Status &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#implementation-status&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;C++ doesn&#39;t have a standard implementation of this kind of reference
counted pointer, but the popular &lt;a href=&quot;https://www.boost.org/&quot;&gt;Boost&lt;/a&gt; C++
library project provides a version called
&lt;a href=&quot;https://www.boost.org/doc/libs/1_87_0/libs/smart_ptr/doc/html/smart_ptr.html#intrusive_ptr&quot;&gt;intrusive_ptr&lt;/a&gt;,
though it works a little differently than what I&#39;ve sketched above.&lt;/p&gt;
&lt;p&gt;Firefox makes very extensive use of internally reference counted
counted pointers using the &lt;a href=&quot;https://searchfox.org/mozilla-central/source/mfbt/RefPtr.h&quot;&gt;&lt;code&gt;RefPtr&lt;/code&gt;&lt;/a&gt;
template (the sketch above is sort of modeled on Firefox&#39;s
implementation). The decision to use this design is very
old (long predating my time at Mozilla) and dates from a time
when C++ didn&#39;t have good smart pointers. Once that changed
and good smart pointers were widely available, there were
a number of debates&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
about which type to use (I was on Team
use the C++ standard) and so now Firefox contains a mix of
both styles. I don&#39;t know what decisions people would have been
made starting from scratch (though Chrome seems to use the
standard smart pointers a lot more, which is where I got used
to it).&lt;/p&gt;
&lt;h2 id=&quot;unboxing-(again)&quot;&gt;Unboxing (again) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#unboxing-(again)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As should be clear from the discussion above, unless you&#39;re writing
totally greenfield code, it&#39;s very hard to avoid having to unbox
pointers sometime. In my experience, engineers seem to have two
attitudes towards this reality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discourage it and make you work if you want to unbox.&lt;/li&gt;
&lt;li&gt;Lean into it and make it as easy as possible to unbox.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One of the core loci of this debate is whether you should be
able to implicitly convert a smart pointer to a raw pointer, like so.&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;shared_ptr&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;T&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;T&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; t2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that basically all smart pointers in C++ implement some unboxing
method like &lt;code&gt;.get()&lt;/code&gt; and &lt;code&gt;operator-&amp;gt;&lt;/code&gt; so that you can access methods
and properties; the question is whether you automatically convert to
&lt;code&gt;T*&lt;/code&gt; in other contexts. Ordinarily this wouldn&#39;t work in C or
C++ because &lt;code&gt;shared_ptr&amp;lt;T&amp;gt;&lt;/code&gt; and &lt;code&gt;T*&lt;/code&gt; are totally different types and
you can&#39;t just assign one to the other. However, you can make it work
by implementing it
&lt;a href=&quot;https://en.cppreference.com/w/cpp/language/cast_operator&quot;&gt;explicitly&lt;/a&gt;
as part of &lt;code&gt;shared_ptr&amp;lt;T&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The argument for implementing automatic conversion
this is that it makes it easy when you inevitably have to unbox; the
argument against it is that it&#39;s all too easy to unbox accidentally
and that you should have to do it explicitly. C++&#39;s smart pointers
force you to call &lt;code&gt;.get()&lt;/code&gt;. Firefox&#39;s &lt;a href=&quot;https://searchfox.org/mozilla-central/source/mfbt/RefPtr.h#317&quot;&gt;do not&lt;/a&gt;
and in fact &lt;a href=&quot;https://searchfox.org/mozilla-central/source/mfbt/RefPtr.h#309&quot;&gt;discourage calling &lt;code&gt;.get()&lt;/code&gt;&lt;/a&gt;.
I think this is the wrong answer but was not able to persuade enough
people to get it changed; it&#39;s easy to add affordances like this,
but much harder to remove them once people start to rely on them
and you need to change all the relying code.&lt;/p&gt;
&lt;h2 id=&quot;this-is-all-baked-in&quot;&gt;This is all baked in &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#this-is-all-baked-in&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One important thing to realize is that smart pointers aren&#39;t
some new piece of C++ syntax; they&#39;re just a new combination
of a number of existing C++ features, namely:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Constructors and destructors to enable RAII&lt;/li&gt;
&lt;li&gt;Overloading the copy constructor, copy assignment operator, etc.
provide the appropriate functionality for copying and assignment.&lt;/li&gt;
&lt;li&gt;Overloading &lt;code&gt;-&amp;gt;&lt;/code&gt; (and sometimes automatic conversation)
to make the smart pointer act like a regular
pointer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&#39;s why we&#39;re able to implement our own smart pointers
that do the same thing as the ones shipped with the C++ library.
This kind of thing is something you see a lot with powerful
languages like C++: people realize that they can put
together existing features in new ways to produce new
functionality that wasn&#39;t built into the language.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-rust&quot;&gt;Next Up: Rust &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-3/#next-up%3A-rust&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The major reason this is all so messy is that smart pointers are layered
onto C++&#39;s previously existing unsafe memory management system. This
means that you can always opt out of smart pointers and use unboxed
pointers, at which point you&#39;ve given up all your safety guarantees.
This is actually something you have to do sometimes—especially
when you are working with legacy code—but the lack
of compiler enforcement encourages you to do that rather than figuring
out how to do things safely without unboxing. Next up, we&#39;ll be
looking at a language which was built to be safe from the ground up:
Rust.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s not obvious why this should work, because we want
to actually operate on &lt;code&gt;h-&amp;gt;update()&lt;/code&gt; not get the internal
pointer but the special
sauce in C++ is that it will keep applying
&lt;code&gt;-&amp;gt;&lt;/code&gt; until it gets something
that &lt;a href=&quot;https://en.cppreference.com/w/cpp/language/operator_member_access%5D&quot;&gt;makes sense&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Don&#39;t ask why the &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; syntax means move constructor; you don&#39;t want to know.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that there are some odd shenanigans in this code to deal
with the fact that C++ requires that objects be declared before
they are used. Because &lt;code&gt;Child&lt;/code&gt; and &lt;code&gt;Parent&lt;/code&gt; both reference
each other, there is no order in which you can have the complete
code for &lt;code&gt;Parent&lt;/code&gt; before or after the complete code for &lt;code&gt;Child&lt;/code&gt;;
instead we have to break them up a bit so the relevant pieces
are available at the right times. Newer languages like Rust or
Go tend to be better about looking ahead so you don&#39;t need to
do this kind of thing. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Programmers love to say &amp;quot;DAG&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Centering primarily around alleged performance concerns
for incrementing and decrementing the reference count
for shared pointer.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-3/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding Memory Management, Part 2: C++ and RAII</title>
		<link href="https://educatedguesswork.org/posts/memory-management-2/"/>
		<updated>2025-02-17T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/memory-management-2/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/c++-cover.jpeg&quot; alt=&quot;Cover image&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;This is the second post in my planned multipart&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
series on memory management. In part &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;I&lt;/a&gt;
we covered the basics of memory allocation and how it works in
C, where the programmer is responsible for manually allocating
and freeing memory. In this post, we&#39;ll start looking at memory
management in C++, which provides a number of much fancier
affordances.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-c%2B%2B&quot;&gt;Background: C++ &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#background%3A-c%2B%2B&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As the name suggests,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=C%2B%2B&amp;amp;oldid=1264775453&quot;&gt;C++&lt;/a&gt;
is a derivative of C.  The original version of C++ was basically
an object oriented version of C (&amp;quot;C with classes&amp;quot;) but at this
point it has been around for 40-odd years and so has diverged very
significantly (though modern C is a lot more like original C than C++
is) and accreted a lot of features beyond what you&#39;d think of in an
object oriented language, such as generic programming via
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Template_(C%2B%2B)&amp;amp;oldid=1260515346&quot;&gt;templates&lt;/a&gt;
and closures
(&lt;a href=&quot;https://en.cppreference.com/w/cpp/language/lambda&quot;&gt;lambdas&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Despite this, C++ preserves a huge amount of C heritage and many C
programs will compile just fine with a C++ compiler;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;  in fact, C++
was originally implemented with a pre-processor called &amp;quot;cfront&amp;quot; which
compiled C++ code down into C code, though that&#39;s not how things work
now. This is actually a source of a lot of issues with C++, when
programmers do things the C way—or even the older C++
way—even though modern C++ has better methods. We&#39;ll see some examples
of this later in this post.&lt;/p&gt;
&lt;p&gt;The most obvious change in C++ is the introduction of the idea of
&lt;em&gt;objects&lt;/em&gt; and &lt;em&gt;classes&lt;/em&gt;. At a high level, an &lt;em&gt;object&lt;/em&gt; is a data
type that has both &lt;em&gt;data&lt;/em&gt; and &lt;em&gt;code&lt;/em&gt; associated with it, where
&lt;em&gt;code&lt;/em&gt; means &lt;em&gt;functions&lt;/em&gt;.
But let&#39;s start by looking at a type which just has data associated
with it, but where that data is somewhat complex.&lt;/p&gt;
&lt;h4 id=&quot;c-structs&quot;&gt;C Structs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#c-structs&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Complex data types are already a feature in C. For instance, consider the following
example type:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;rectangle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Even if you don&#39;t know C, if you&#39;ve done any programming you can
probably figure out what this means: it&#39;s defining a new type that represents a
rectangle and has two values, the height and the width of the
rectangle, each of which are integers (&lt;code&gt;int&lt;/code&gt; being one of the C
integer types). Obviously you could just have two variables,
&lt;code&gt;rectangle_width&lt;/code&gt; and &lt;code&gt;rectangle_height&lt;/code&gt;, but this lets you
group them together, like so:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rectangle r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;rectangle r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Make a rectangle of width 10 and height 2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Area is %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this example, we&#39;ve defined a function called area that takes
a rectangle as an argument and returns the product of the width
and the height. Note that the notation for accessing a one of the
values inside a C &lt;code&gt;struct&lt;/code&gt; is the &lt;code&gt;a.b&lt;/code&gt; where &lt;code&gt;a&lt;/code&gt; is the name
of the variable containing the struct and &lt;code&gt;b&lt;/code&gt; is the name of
the &lt;code&gt;field&lt;/code&gt; inside the struct (e.g., &lt;code&gt;width&lt;/code&gt;).&lt;/p&gt;
&lt;h4 id=&quot;call-by-value&quot;&gt;Call by Value &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#call-by-value&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;I&#39;ve actually done something new here that you might not have noticed,
which is that I&#39;ve passed our &lt;em&gt;struct&lt;/em&gt; to the function. All function
calls in C are what&#39;s called &amp;quot;call by value&amp;quot;, which is to say that C
makes a copy of the data element that is available to the function but
is disconnected from the original value. The called function can change its
arguments without affecting the caller. Consider, for instance, the
following example.&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;shrink&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rectangle r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Inner width=%d height=%d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;rectangle r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Make a rectangle of width 10 and height 2&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;shrink&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Outer width=%d height=%d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As expected, this prints out:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Inner width=5 height=1
Outer width=10 height=2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;because &lt;code&gt;shrink&lt;/code&gt; just modified its own copy of &lt;code&gt;r&lt;/code&gt;. Function calls are
just a special case of generically how assignments in &lt;code&gt;C&lt;/code&gt; work: they make a copy of
whatever memory was associated with the source and stuff it into the
target.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;C does provide a way for the called function to modify memory associated
with the caller: the caller just passes a pointer to the callee rather
than the variable itself, as in the following code:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;shrink&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rectangle&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; rp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   rp&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;width &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; rp&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;width&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   rp&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; rp&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;height&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Inner width=%d height=%d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; rp&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; rp&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;rectangle r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Make a rectangle of width 10 and height 2&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;shrink&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Outer width=%d height=%d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note the new notation here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&amp;amp;&lt;/code&gt; takes a pointer to a variable so &lt;code&gt;&amp;amp;r&lt;/code&gt; is a pointer to &lt;code&gt;r&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;a-&amp;gt;b&lt;/code&gt; accesses a variable in a struct when you have a pointer to
the struct. This is what is known as &amp;quot;syntactic sugar&amp;quot; because
you could just do &lt;code&gt;(*a).b&lt;/code&gt;, but it&#39;s used all the time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This snippet does what we expect, which is to say modifies the
value in the outer function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Inner width=5 height=1
Outer width=5 height=1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&#39;s important to realize, though, that C was still doing call-by-value;
it&#39;s just that the value we passed was a pointer to &lt;code&gt;r&lt;/code&gt; rather than
&lt;code&gt;r&lt;/code&gt; itself, which allowed the function to manipulate the memory that
the argument pointed to rather than its local copy of that variable.&lt;/p&gt;
&lt;h3 id=&quot;objects-and-classes&quot;&gt;Objects and Classes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#objects-and-classes&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Everything we&#39;ve seen here is still normal C, but often we want
to associate a function with a type. For instance, the area function
we have shown above only works with rectangles, but what if we
had circles as well? We&#39;d end up with two functions, one
called &lt;code&gt;area_rectangle&lt;/code&gt; and one called &lt;code&gt;area_circle&lt;/code&gt;. Objects
give us another option, which is to associate the function
with the type, so that we can do something like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;Rectangle r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Make a rectangle of width 10 and height 2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Area is %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;             &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&#39;ve got some new syntax here, but it&#39;s basically an extension
of the old syntax. Instead of referring to a data element with
&lt;code&gt;r.height&lt;/code&gt; we are now referring to the function &lt;code&gt;area()&lt;/code&gt; with the
the syntax &lt;code&gt;r.area()&lt;/code&gt;. Also we don&#39;t have to pass
the data values to &lt;code&gt;r.area()&lt;/code&gt; because it just gets them
as part of the function call, which is very convenient if we also
have circles, because then we can do:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;Circle r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Area is %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;             &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that the call to &lt;code&gt;area()&lt;/code&gt; is exactly the same in both cases.
This syntax hides what kind of object we are working with,
which lets us reason about the logic of the program without
worrying about what shape we are working with.
Which &lt;code&gt;area&lt;/code&gt; function gets called depends on the type of object
(&lt;code&gt;Rectangle&lt;/code&gt; or &lt;code&gt;Circle&lt;/code&gt;). This type of
function is called a &lt;em&gt;method&lt;/em&gt; or a &lt;em&gt;member function&lt;/em&gt; of the
type it&#39;s associated with.&lt;/p&gt;
&lt;p&gt;Of course, we still have to define &lt;code&gt;Rectangle&lt;/code&gt; and &lt;code&gt;Circle&lt;/code&gt;. The
definition of &lt;code&gt;Rectangle&lt;/code&gt; looks like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; width &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first part of this is basically the same as &lt;code&gt;struct rectangle&lt;/code&gt;,
except for the &lt;code&gt;public:&lt;/code&gt; line, which we can ignore for now.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Just
as before, we have &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt;. What&#39;s new here is the
&lt;code&gt;area()&lt;/code&gt; function. This is also almost exactly the same as before,
except for two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It&#39;s defined inside the class.&lt;/li&gt;
&lt;li&gt;We don&#39;t need to pass a copy of &lt;code&gt;Rectangle&lt;/code&gt; as an argument
because the &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt; fields are automatically
available to any member function.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The definition of &lt;code&gt;Circle&lt;/code&gt; is similar, except with the standard
&lt;span&gt;π r&lt;sup&gt;2&lt;/sup&gt;&lt;/span&gt; area formula&lt;/p&gt;
&lt;p&gt;To recap the terminology here: the &lt;strong&gt;class&lt;/strong&gt; is the type definition
and an &lt;strong&gt;object&lt;/strong&gt; is a given instance of the class.&lt;/p&gt;
&lt;h3 id=&quot;inheritance&quot;&gt;Inheritance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#inheritance&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We won&#39;t really need this in this post, but I&#39;d be remiss if I didn&#39;t
mention one of the most important features of classes, which is
&lt;em&gt;inheritance&lt;/em&gt;. The idea here is to say that a given class, say
&lt;code&gt;Rectangle&lt;/code&gt; is itself &lt;em&gt;derived from&lt;/em&gt; a more general class, such as
&lt;code&gt;Shape&lt;/code&gt;. Anywhere you could use a pointer to &lt;code&gt;Shape&lt;/code&gt; you can use
a pointer to a &lt;code&gt;Rectangle&lt;/code&gt; instead. For example, we could define
a &lt;code&gt;Shape&lt;/code&gt; as having an &lt;code&gt;area()&lt;/code&gt; function like so:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;virtual&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice that we haven&#39;t
provided a definition (body) for &lt;code&gt;area()&lt;/code&gt;, instead we have the &lt;code&gt;virtual&lt;/code&gt; keyword in front
and there is &lt;code&gt;= 0&lt;/code&gt; in place of the body. Together these mean that all classes derived
from &lt;code&gt;Shape&lt;/code&gt; have to define &lt;code&gt;area()&lt;/code&gt; for themselves.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
We then modify &lt;code&gt;Rectangle&lt;/code&gt; to indicate that it is derived from &lt;code&gt;Shape&lt;/code&gt; and we&#39;ll
need &lt;code&gt;virtual&lt;/code&gt; in front of &lt;code&gt;area&lt;/code&gt; here for some technical reasons which we
don&#39;t need to go into.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token base-clause&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;virtual&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; width &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result of all this is we can now write a function which can take
&lt;em&gt;any&lt;/em&gt; shape and do stuff, as in:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;print_area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Shape &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;s&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Area = %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; s&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we have a &lt;code&gt;Rectangle r&lt;/code&gt; then &lt;code&gt;print_area()&lt;/code&gt; can be called just
like you would expect:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token function&quot;&gt;print_area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you&#39;ve been paying attention, you&#39;ll have noticed that I said you
can use a &lt;strong&gt;pointer&lt;/strong&gt; to &lt;code&gt;Rectangle&lt;/code&gt; wherever you could have used a
pointer to &lt;code&gt;Shape&lt;/code&gt;. You cannot, however, use a &lt;code&gt;Rectangle&lt;/code&gt; wherever
you would have used a &lt;code&gt;Shape&lt;/code&gt;. If you try to assign a &lt;code&gt;Rectangle&lt;/code&gt;
to a &lt;code&gt;Shape&lt;/code&gt; you end up with something with the properties of
&lt;code&gt;Shape&lt;/code&gt; but not &lt;code&gt;Rectangle&lt;/code&gt;. This is called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Object_slicing&amp;amp;oldid=1262121807&quot;&gt;object slicing&lt;/a&gt; and it&#39;s usually not what you want.&lt;/p&gt;
&lt;h3 id=&quot;constructors-and-destructors&quot;&gt;Constructors and Destructors &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#constructors-and-destructors&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There&#39;s one more C++ feature we need in order to understand basic
C++ memory management, and that&#39;s &lt;em&gt;constructors&lt;/em&gt; (often
abbreviated &lt;em&gt;ctor&lt;/em&gt;s) and &lt;em&gt;destructors&lt;/em&gt; (&lt;em&gt;dtor&lt;/em&gt;s).
So far we&#39;ve initialized stuff just by setting the fields, but
C++ lets us do more: a class can have a function that runs
whenever an object of that class is created. That&#39;s not really
that useful with this simple an object, but just as an example
suppose we wanted to print something out for debugging purposes
whenever someone created a &lt;code&gt;Rectangle&lt;/code&gt;. Then we could do:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token base-clause&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Constructor&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;Rectangle&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; w&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; h&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;     width &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; w&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;     height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; h&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;     &lt;br /&gt;     &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle created with width=%d height=%d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The constructor also has to initialize
the fields in the object, as we&#39;ve done here.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
Then when you want to create a &lt;code&gt;Rectangle&lt;/code&gt; you could do:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Rectangle r(10, 20);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This creates a &lt;code&gt;Rectangle&lt;/code&gt; on the stack. If you want to create a &lt;code&gt;Rectangle&lt;/code&gt;
on the heap, you don&#39;t use &lt;code&gt;malloc()&lt;/code&gt; but instead a new operator called &lt;code&gt;new&lt;/code&gt;,
as in:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Rectangle *r = new Rectangle(10, 20);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;new&lt;/code&gt; tells the C++ compiler that this is an object and should run
the constructor (conceptually it&#39;s like calling &lt;code&gt;malloc()&lt;/code&gt; and then
calling the constructor). If you used &lt;code&gt;malloc()&lt;/code&gt; you would just get uninitialized
memory of the right size.&lt;/p&gt;
&lt;p&gt;C++ also supports &lt;em&gt;destructors&lt;/em&gt;, which are functions that run before
the object is destroyed. But when is an object destroyed, you might
ask. Remember how I said that in C freeing an object just means that
you release the memory for another use? C++, however, has a richer
concept of object lifecycle: whenever a C object would just have
its memory returned, C++ thinks of this as an object being destroyed.
This means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the object is on the stack, when the object goes out of scope
(e.g., when the function returns).&lt;/li&gt;
&lt;li&gt;If the object is on the heap, when it is explicitly destroyed
with &lt;code&gt;delete&lt;/code&gt; (note: not &lt;code&gt;free()).&lt;/code&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
If you have a pointer to an object on the stack and it goes
out of scope, you get a leak, just like in C.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A destructor gets written like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token base-clause&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Shape&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;  &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Destructor&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Rectangle&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;     &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle destroyed with width=%d height=%d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;~&lt;/code&gt; prefix indicates that it&#39;s a destructor. Note that the destructor
still has access to the member variables, which is why it&#39;s able to
print them out. As long as they&#39;re regular
variables and not pointers, it doesn&#39;t need to do anything with them,
as they&#39;ll just be destroyed when the object is finally destroyed.
If they&#39;re pointers, however, the destructor needs to call &lt;code&gt;delete&lt;/code&gt; or
there will likely be a memory leak (unless the data is referenced elsewhere).
In either case, the destructors of the member variables will themselves
be run as part of the destruction process.&lt;/p&gt;
&lt;p&gt;Putting it all together, if we have the following program:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;Rectangle &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Rectangle&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;r&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;print_area&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;delete&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We would expect to see:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;Rectangle created with width&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt; height&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;br /&gt;Area &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;20&lt;/span&gt;&lt;br /&gt;Rectangle destroyed with width&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt; height&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You&#39;ll notice that I&#39;m not checking for errors when I do &lt;code&gt;new&lt;/code&gt;, unlike with
C where we had to check that &lt;code&gt;malloc()&lt;/code&gt; hadn&#39;t failed. By default, if
&lt;code&gt;new&lt;/code&gt; isn&#39;t able to allocate memory it will crash the program&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
rather
than returning an error (or rather a null pointer). The technical term
for this is that &lt;code&gt;new&lt;/code&gt; is &amp;quot;infallible&amp;quot; whereas &lt;code&gt;malloc()&lt;/code&gt; is &amp;quot;fallible&amp;quot;,
thus forcing you to handle allocation failures. It&#39;s possible to
tell C++ that you want &lt;code&gt;new&lt;/code&gt; to be fallible using &lt;code&gt;std::new_throw&lt;/code&gt;,
in which case &lt;code&gt;new&lt;/code&gt; will return &lt;code&gt;nullptr&lt;/code&gt; (0) the way &lt;code&gt;malloc()&lt;/code&gt; does.
Infallible memory allocation is a pretty common pattern in
newer languages, many of which don&#39;t even really let you detect
memory failure; they just crash the program.
Whether this is good or bad is a matter of opinion.&lt;/p&gt;
&lt;h3 id=&quot;raii&quot;&gt;RAII &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#raii&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We now have the pieces we need to significantly improve memory allocation.
Let&#39;s go back to our previous program and instead of just having a raw
pointer, we&#39;re going to define a class that holds the list of lines. It
looks like this:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Data&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  size_t num_lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;Data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;nullptr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    num_lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token operator&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;size_t i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the same data structure as before, except that we&#39;ve:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Moved the local variables into the class.&lt;/li&gt;
&lt;li&gt;Put the initialization logic in the constructor and the teardown logic
in the destructor.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The rest of the program remains the same, except that we have to
access &lt;code&gt;lines&lt;/code&gt; and &lt;code&gt;num_lines&lt;/code&gt; via the &lt;code&gt;data&lt;/code&gt; object.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
Note that we never have to explicitly call the destructor,
it just runs automatically when we return from the function.
This may seem like a small improvement,
but let&#39;s go back to the case we looked at in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1#error-handling&quot;&gt;part I&lt;/a&gt;
where we had an error handling block. Recall that that code looked
like this:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; status &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; OK&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;l &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fgets&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// End of file (hopefully).&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;strlen&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token char&quot;&gt;&#39;&#92;n&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      status &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; BAD_LINE_ERROR&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;goto&lt;/span&gt; error&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;error&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Clean up.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;fclose&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; status&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We had to have the special cased and error prone &lt;code&gt;error:&lt;/code&gt; block that
did cleanup. Now let&#39;s look at (almost) the same code in C++:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Data &lt;span class=&quot;token function&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;l &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fgets&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// End of file (hopefully).&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;strlen&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token char&quot;&gt;&#39;&#92;n&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;       &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; BAD_LINE_ERROR&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Clean up.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;fclose&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; OK&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By using the destructor, we&#39;ve gotten rid of the potential memory leak entirely:
anything that causes &lt;code&gt;data&lt;/code&gt; to go out of scope automatically invokes
the destructor, and so the memory we&#39;ve allocated gets cleaned up.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;
We do, however, still have a leak: the file pointer &lt;code&gt;fp&lt;/code&gt;, which gets cleaned
up properly in the normal case but not in the error case. If we wanted,
we could address this by making a new class to wrap &lt;code&gt;fp&lt;/code&gt;,
but C++ has already done this for us using the &lt;a href=&quot;https://cplusplus.com/doc/tutorial/files/&quot;&gt;&lt;code&gt;std::fstream&lt;/code&gt;&lt;/a&gt;, which gets used like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream &lt;span class=&quot;token function&quot;&gt;fs&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;input.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;in&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;open&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we use &lt;code&gt;std::fstream&lt;/code&gt; we don&#39;t need to clean up the file at all
because it will just happen automatically, and the final block just
looks like:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; OK&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This style of memory management is often called &amp;quot;RAII&amp;quot;, which
stands for &amp;quot;Resource Acquisition is Initialization&amp;quot;. RAII is not exactly
winning any records for the clearest name, and mostly people just say
&amp;quot;RAII&amp;quot;. The idea here is that the process of creating the object
(e.g., &lt;code&gt;Data&lt;/code&gt; or &lt;code&gt;fstream&lt;/code&gt;) allocates its resources and the process of
destroying the object deallocates its resources, so as long as you
have a valid copy of the object, you know it&#39;s safe to use and once
the object goes out of scope, things will automatically get cleaned
up. As you can see, RAII really simplifies memory management and
is generally considered to be the most convenient way to do C++
memory management (though there are also vocal &lt;a href=&quot;https://kristoff.it/blog/raii-rust-linux/&quot;&gt;RAII opponents&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Note that what makes RAII work here is that the &lt;em&gt;object&lt;/em&gt; is on the stack
but it&#39;s holding resources on the heap. That way when the function
returns, the object is automatically destroyed. If instead you
were to allocate the object on the heap and stored a pointer
on the stack, we would still have a problem. I&#39;ll be getting to how
to address in a later post.&lt;/p&gt;
&lt;h2 id=&quot;containers&quot;&gt;Containers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#containers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Stuffing our list of stored lines into a class helps some, but
it&#39;s not really ideal. We&#39;ve had to make this new &lt;code&gt;Data&lt;/code&gt; class
and then we have to reach into the class to add new lines
and to sort the lines. We could of course add new interfaces
to &lt;code&gt;Data&lt;/code&gt; but C++ has already done the heavy listing for us
by providing containers.
A container is basically just a fancy term for an object
whose purpose is to holds some number of other objects
like a list, vector, or map. Remember &lt;code&gt;all_lines = []&lt;/code&gt; from our
original Python version? That&#39;s a container. Here&#39;s our new
program rewritten with some C++ containers.&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;vector&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string line&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream &lt;span class=&quot;token function&quot;&gt;fs&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;input.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;fstream&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;in&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// 1. Read in the file.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;is_open&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;good&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push_back&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// 2. Sort the list.&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// 3. Print out the result.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;size_t i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%s&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;c_str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The key line to look at here is the following:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;vector&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What this does is to make a &amp;quot;vector&amp;quot; called &lt;code&gt;lines&lt;/code&gt; which is basically
a self-growing container that can be indexed like an array.  &lt;code&gt;lines&lt;/code&gt;
will contain an arbitrary number of objects of type &lt;code&gt;string&lt;/code&gt;, which,
unsurprisingly, is a C++ object that contains a string of characters.
This is loosely analogous to the Python code &lt;code&gt;all_lines = []&lt;/code&gt; except
that Python lists can contain mixed types of objects, as in:&lt;/p&gt;
&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;all_lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;abc&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which contains a string and an integer; this vector can only contain
strings.&lt;/p&gt;
&lt;p&gt;Containers massively simplify things because now when we want to add a line
that we read in to our list of lines it&#39;s a one-liner:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;good&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push_back&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This replaces all the complicated machinery we had before where we
had to manually make room in &lt;code&gt;lines&lt;/code&gt; and then make a copy of the string
to add to lines, because C++ does all of that for us. Moreover, we
don&#39;t need to worry about the string being too big because &lt;code&gt;std::getline()&lt;/code&gt;
will automatically grow our buffer (&lt;code&gt;line&lt;/code&gt;) to whatever size is needed,
which eliminates a lot of the error cases. However, if we did have an
error for some reason, then RAII would of course clean up. For instance,
the following code returns an error if lines are more than 1024 characters
long.&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fs&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;good&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;       &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; BAD_LINE_ERROR&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;push_back&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because we are using RAII this is totally fine and both the file
and the list of strings will be cleaned up properly.&lt;/p&gt;
&lt;p&gt;The sort is a one-liner too, though the syntax is kind of gross. You
can sort of see what&#39;s happening here, namely that we&#39;re providing the
first and last items in the vector and then &lt;code&gt;std::sort()&lt;/code&gt; figures it
out.  The actual details are sort of subtle and out of scope for this
post.&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token comment&quot;&gt;// 2. Sort the list.&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This leaves us with the last clause, where we iterate over the
list of sorted lines and print them out. This code is the most similar
to the previous version, differing mostly in that we don&#39;t have
to remember how many lines there are because the &lt;code&gt;.size()&lt;/code&gt; function
lets you ask a vector how big it is:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token comment&quot;&gt;// 3. Print out the result.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;size_t i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%s&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;c_str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The other change is that we have to use &lt;code&gt;.c_str()&lt;/code&gt; method to get the
underlying &lt;code&gt;char *&lt;/code&gt; to pass it to &lt;code&gt;printf()&lt;/code&gt; because &lt;code&gt;printf()&lt;/code&gt;
doesn&#39;t know what to do with a C++ string.&lt;/p&gt;
&lt;p&gt;This isn&#39;t really that idiomatic C++ for several reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;C++ has its own functions to print stuff to the console and most programmers
prefer those. Those functions will also take a string directly rather than
needing &lt;code&gt;c_str()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;In modern C++, you would use an iterator (&lt;code&gt;for (auto x : lines)&lt;/code&gt;).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The more modern code would look like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;auto&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;cout &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;endl&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&#39;ve written it the less idiomatic way for two reasons. First, it&#39;s more familiar and
I&#39;m trying not to introduce too many new things at once. Understanding
what&#39;s happening here requires a bunch of new concepts. Second, and
more importantly, it illustrates something important about C++, which is
that while the better modern techniques are available to you, you&#39;re
not required to use them, and in fact C++ lets you do all kinds of
unsafe stuff. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Array-style accesses to vector elements aren&#39;t bounds checked,
so if I did &lt;code&gt;lines[100000]&lt;/code&gt; after reading one line, anything
could happen, up to and including the compiler deciding to
delete all your files, start mining Bitcoin, or call 911
(the technical term here is &lt;a href=&quot;https://en.cppreference.com/w/c/language/behavior&quot;&gt;undefined behavior&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;c_str()&lt;/code&gt; returns a pointer to whatever internal storage the
string object is using to store its value (as of C++11), which
means that we have to worry about all the same lifetime issues
as before. For instance, if we were to return the value of &lt;code&gt;c_str()&lt;/code&gt;
from this function, that value would not be safe to use because
it would be pointing to storage that had been destroyed
when the string went out of scope.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key point is that C++ provides safe ways to work with these objects,
but it &lt;em&gt;also&lt;/em&gt; lets you do all the old unsafe C stuff.&lt;/p&gt;
&lt;h2 id=&quot;shallow-and-deep-copying&quot;&gt;Shallow and Deep Copying &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#shallow-and-deep-copying&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Recall that I said above that when you assign one variable to another,
C just copies the internal values.
This includes structs, so that,
for instance, when we do:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;rectangle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;rectangle r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;rectangle r2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;r2&lt;/code&gt; just becomes a copy of &lt;code&gt;r&lt;/code&gt;, and they&#39;re totally independent, so
in the following code:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;rectangle r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;rectangle r2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// I&#39;m a square!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 1: %d x %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 2: %d x %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We would get the output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Rectangle 1: 10 x 2
Rectangle 2: 10 x 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The situation is no different when one of the fields in a struct is a
pointer. For instance:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// Wrap strdup so that we don&#39;t have to error check every&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;// time we use it.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;infallible_strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;from&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;retval &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;from&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;retval&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Out of memory&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; retval&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;rectangle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;rectangle r1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;infallible_strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;br /&gt;rectangle r2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// I&#39;m a square!&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Square 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%s: %d x %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%s: %d x %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Attention:&lt;/strong&gt; I added &lt;code&gt;infallible_strdup()&lt;/code&gt; because I got tired of
writing out the error checking and I thought it distracted
from the main flow of the code. In a real program, you might do
better.&lt;/p&gt;
&lt;p&gt;This prints:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Square 1: 10 x 2
Square 1: 10 x 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Wait, what? The sizes are different but the name is the same. This happens because when we did the assignment we just assigned the pointer&#39;s
&lt;em&gt;value&lt;/em&gt; not the string&#39;s value (i.e., &lt;code&gt;r1.name == r2.name&lt;/code&gt;), so &lt;code&gt;r1.name&lt;/code&gt; and
&lt;code&gt;r2.name&lt;/code&gt; are pointing at the same object. &lt;code&gt;strcpy()&lt;/code&gt; just overwrites that
memory, with the result that both objects end up with &lt;code&gt;name = &amp;quot;Square&amp;quot;&lt;/code&gt;.
By contrast, because &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt; are just values, then there
are separate values in &lt;code&gt;r1&lt;/code&gt; and &lt;code&gt;r2&lt;/code&gt;, as shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/shallow-copy.png&quot; alt=&quot;Shallow Copy&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The result of a shallow copy
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is what&#39;s often called a &amp;quot;shallow
copy&amp;quot; as opposed to a &amp;quot;deep copy&amp;quot;, where there would be two different
strings in &lt;code&gt;r1&lt;/code&gt; and &lt;code&gt;r2&lt;/code&gt;. Doing a deep copy in this case obviously requires
allocating new memory for &lt;code&gt;r2.name&lt;/code&gt; and then copying the contents of the
string into it (presumably via some API like &lt;code&gt;infallible_strdup&lt;/code&gt;). If we want a
deep copy in C, we need to do it explicitly. For instance:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;copy_rectangle&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rectangle &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;to&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; rectangle &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;from&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    to&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;infallible_strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;from&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    to&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; from&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    to&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; from&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result looks like this:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/deep-copy.png&quot; alt=&quot;Deep Copy&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The result of a deep copy
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;copy-constructors&quot;&gt;Copy Constructors &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#copy-constructors&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;By default, C++ also does shallow copies, but it provides a facility
that lets you do better.
When you make one C++ object starting from  another of the same type,
the compiler invokes what&#39;s called the
&amp;quot;copy constructor&amp;quot;, which is a special method of the new
object that takes the object you&#39;re copying from as an argument.
For instance, if we just wanted to do a shallow copy of &lt;code&gt;Rectangle&lt;/code&gt; it
would look like this (recall that the unqualified names of member
variables in methods just refer to the current object):&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;Rectangle&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; Rectangle&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; name&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    width &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is just the same thing that happened above, but we&#39;ve done it
explicitly. If you don&#39;t supply your own copy constructor, C++ will
make one that does a shallow copy, which is to say basically this
code. But you can also provide a copy
constructor that will do anything you want.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt;
For instance, here&#39;s a
deep copy:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  &lt;span class=&quot;token function&quot;&gt;Rectangle&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; Rectangle&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Deep copy of |name|&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;tmp &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;infallible_strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Just copy |width| and |height| because they are&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// numbers and don&#39;t point to other memory.&lt;/span&gt;&lt;br /&gt;    width &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that we don&#39;t need to do anything special for &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt;
because they aren&#39;t pointers to anything, just values. However, because
we&#39;ve replaced the copy constructor we do need to explicitly copy them.
But for &lt;code&gt;name&lt;/code&gt;
we want to allocate new memory and copy &lt;code&gt;name&lt;/code&gt; into it. Now let&#39;s do the
do the same thing as before where we mess with the values in &lt;code&gt;r2&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Rectangle &lt;span class=&quot;token function&quot;&gt;r1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  Rectangle r2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// I&#39;m a square!  &lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Square 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%s: %d x %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%s: %d x %d&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This has the result we want:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Rectangle 1: 10 x 2
Square 1: 10 x 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point you could be forgiven for thinking that this is all
just syntactic sugar. After all, &lt;code&gt;copy_rectangle()&lt;/code&gt; and the copy
constructor are basically the same code and how hard is it to just write
&lt;code&gt;copy_rectangle(r2, r1)&lt;/code&gt; instead of &lt;code&gt;Rectangle r2 = r1&lt;/code&gt;? At some
level this is true of course, in the sense that all programming
languages are syntactic sugar on top of assembly, but this is
very useful syntactic sugar.&lt;/p&gt;
&lt;p&gt;Consider what happens if we have the following class:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;TwoRectangles&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt; &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;   Rectangle r1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   Rectangle r2&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we now do:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;TwoRectangles t1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; t2&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will just work because the &lt;em&gt;default&lt;/em&gt; copy constructor for
&lt;code&gt;TwoRectangles&lt;/code&gt; calls the copy constructors for &lt;code&gt;Rectangle&lt;/code&gt; when we try
to make &lt;code&gt;t1&lt;/code&gt; from &lt;code&gt;t2&lt;/code&gt;.  By contrast, without this feature we would
need to write &lt;code&gt;copy_two_rectangles()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;copy_two_rectangles&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;TwoRectangles &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;to&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; TwoRectangles &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;from&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;copy_rectangle&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;to&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;r1&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;from&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;r1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;copy_rectangle&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;to&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;r2&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;from&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;r2&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Basically, as long as you are working with objects which contain
only other objects which have copying implemented correctly
then they will behave properly when you try to copy them
without you having to do anything special. This isn&#39;t that
big an issue in a small system but once things get large
it&#39;s pretty convenient not to have to think about writing
all the boilerplate to recursively copy everything. As with
our &lt;code&gt;area()&lt;/code&gt; method before, the idea is to free you to focus
on the program logic.&lt;/p&gt;
&lt;p&gt;However, this only works if the object contains &lt;em&gt;objects&lt;/em&gt;. If it
contains &lt;em&gt;pointers&lt;/em&gt; then those pointers will be copied directly
as usual without invoking the copy constructor. Fortunately,
C++ has an extensive set of container classes so that you
can often—though not always—get away without
having to store pointers in your objects. In this specific case,
if we just used the C++ &lt;code&gt;string&lt;/code&gt; class instead of C-style &lt;code&gt;char *&lt;/code&gt;,
as shown below, then the default copy constructor would work
fine and we wouldn&#39;t have to do anything (which is why
I showed the worse version that uses &lt;code&gt;char *&lt;/code&gt;);&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Rectangle&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string name&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;copy-assignment-constructors&quot;&gt;Copy Assignment Constructors &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#copy-assignment-constructors&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Nerd sniping alert: this section is going to go a bit into some nitpicky
C++ detail. You can safely skip it without missing the main point.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Let&#39;s go back to our above code where we use the &lt;code&gt;Rectangle&lt;/code&gt; copy
constructor:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Rectangle &lt;span class=&quot;token function&quot;&gt;r1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  Rectangle r2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What if we alter it slightly so that we construct &lt;code&gt;r2&lt;/code&gt; first and then
assign &lt;code&gt;r1&lt;/code&gt; to &lt;code&gt;r2&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Rectangle &lt;span class=&quot;token function&quot;&gt;r1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  Rectangle &lt;span class=&quot;token function&quot;&gt;r2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 2&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  Rectangle r2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is superficially similar to the previous code but actually does something quite
different. Instead of invoking the copy constructor, in this case it
invokes the &lt;a href=&quot;https://en.cppreference.com/w/cpp/language/copy_assignment&quot;&gt;copy assignment operator&lt;/a&gt;,
which is used whenever you assign one object to another. The reason
that the copy constructor was invoked in the first example is that &lt;code&gt;r2&lt;/code&gt;
&lt;em&gt;[Fixed from &lt;code&gt;r1&lt;/code&gt; -- 2025-05-26]&lt;/em&gt;
was still under construction, but in the second example, it&#39;s already
fully constructed and so we instead invoke the copy assignment operator.
In case all that&#39;s not clear, look at the following code&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Rectangle &lt;span class=&quot;token function&quot;&gt;r1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Constructor&lt;/span&gt;&lt;br /&gt;  Rectangle &lt;span class=&quot;token function&quot;&gt;r2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;                    &lt;span class=&quot;token comment&quot;&gt;// Copy constructor&lt;/span&gt;&lt;br /&gt;  Rectangle r3 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;                   &lt;span class=&quot;token comment&quot;&gt;// Copy constructor (r3 is under construction)&lt;/span&gt;&lt;br /&gt;  r2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r1&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;                             &lt;span class=&quot;token comment&quot;&gt;// Copy assignment operator&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As with the copy constructor, the copy assignment operator can in principle
do anything, but in practice what you usually want it to do is to clean up the
destination object (similar to what you do do with the destructor)
and then copy the source object onto it, similar to what the copy constructor
would do.&lt;/p&gt;
&lt;p&gt;Here&#39;s an example assignment operator implementation:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Rectangle&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; Rectangle&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Clean up name&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;infallible_strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Just copy the dimensions.&lt;/span&gt;&lt;br /&gt;    width &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;width&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    height &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;height&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are two important things to notice about this code.&lt;/p&gt;
&lt;h4 id=&quot;self-assignment-checks&quot;&gt;Self-assignment checks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#self-assignment-checks&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;First, before
we do anything else, we check to see if we are assigning to ourself,
as in:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If so, we just return early without doing
anything else. This may seem like an optimization but it&#39;s actually critical
for correctness.
To see this, take the assignment operator code and fill in the
actual concrete values for a self-assignment of &lt;code&gt;r1&lt;/code&gt; to itself.
In this case, the lines where we copy over the name look like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;    &lt;span class=&quot;token comment&quot;&gt;// Clean up name&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r1&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    r1&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;infallible_strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r1&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that we&#39;ve just freed &lt;code&gt;r1-&amp;gt;name&lt;/code&gt; and then right away we try to
copy it. Holy &lt;strong&gt;use-after-free&lt;/strong&gt; Batman!&lt;/p&gt;
&lt;h4 id=&quot;cleaning-up&quot;&gt;Cleaning Up &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#cleaning-up&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the copy constructor we just assigned &lt;code&gt;name = infallible_strdup(other.name)&lt;/code&gt;,
but here we have to free &lt;code&gt;this.name&lt;/code&gt; first. Why?&lt;/p&gt;
&lt;p&gt;The reason is that in the copy constructor we knew that the target object
was uninitialized and so &lt;code&gt;this.name&lt;/code&gt; isn&#39;t holding onto any valid memory—though
it might be filled with a random pointer to nothing in particular—but when we are doing copy assignment, the target object already exists
which means that it might have something in &lt;code&gt;this.name&lt;/code&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn15&quot; id=&quot;fnref15&quot;&gt;[15]&lt;/a&gt;&lt;/sup&gt;
and so we need to free it first to prevent a memory leak
(the opposite of the use after free in the previous section).&lt;/p&gt;
&lt;h4 id=&quot;operator-overloading&quot;&gt;Operator Overloading &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#operator-overloading&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;We just glossed over the odd syntax declaring this function:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;  Rectangle&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;operator&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; Rectangle&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; other&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What&#39;s going on here is that C++ allows for what&#39;s called
&lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Operator_overloading&amp;amp;oldid=1258409060&quot;&gt;operator overloading&lt;/a&gt;&lt;/em&gt;,
which means that you can supply new implementations for existing
&amp;quot;operators&amp;quot; like &lt;code&gt;+&lt;/code&gt; or &lt;code&gt;=&lt;/code&gt;. This is very useful because it
allows for idiomatic code in some situations that would otherwise
be confusing.&lt;/p&gt;
&lt;p&gt;A common example here is &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Complex_number&amp;amp;oldid=1273241588&quot;&gt;complex
numbers&lt;/a&gt;.
These aren&#39;t built into C++, which means that it doesn&#39;t know how
to add or subtract them. You can use operator overloading to provide
implementations for &lt;code&gt;+&lt;/code&gt; and &lt;code&gt;-&lt;/code&gt; so you can write:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;c3 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; c1 &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; c2&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;rather than:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;c3 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;c1&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; c2&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which is what you would do in C.&lt;/p&gt;
&lt;p&gt;In this case we are overloading the default copy assignment operator implementation
which would do fieldwise copy just like the copy constructor.&lt;/p&gt;
&lt;h4 id=&quot;why-do-i-need-this-anyway%3F&quot;&gt;Why do I need this anyway? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#why-do-i-need-this-anyway%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;One natural question to ask is why we need to overload the &lt;code&gt;=&lt;/code&gt;
operator.  The obvious alternative is to have the compiler run the
target&#39;s destructor and then the copy constructor (after checking
for self-assignment, of course).&lt;/p&gt;
&lt;p&gt;To be honest, I don&#39;t really have a clear picture of whether this
is actually infeasible or whether instead it&#39;s just a matter
of maintaining maximum programmer flexibility. I&#39;ve spent a bunch
of time searching online and had a number of somewhat frustrating
conversations with ChatGPT and the overall impression I am getting
is that it would violate some pre-existing commitments in C++
(ChatGPT gave me a bunch of stuff about &amp;quot;object identity&amp;quot; and
performance),&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn16&quot; id=&quot;fnref16&quot;&gt;[16]&lt;/a&gt;&lt;/sup&gt;
but it&#39;s not clear to me how serious these issues are.
It&#39;s certainly true that C++ has so much history that any new
feature needs to exist within a complex web of existing constraints,
so it&#39;s possible that this approach would violate one, and it
often takes a lot of analysis to determine if that&#39;s true.
If someone has a better! answer, email me!&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;the-rule-of-three-(or-five)&quot;&gt;The rule of three (or five) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#the-rule-of-three-(or-five)&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;If you have an object for which you need to implement your own copy
constructor, then you probably need to &lt;em&gt;also&lt;/em&gt; implement your own
destructor and copy assignment operator. &lt;code&gt;Rectangle&lt;/code&gt; provides
a good example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We need to implement our own destructor to free &lt;code&gt;name&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;We need to implement our own copy constructor to make
a deep copy of &lt;code&gt;name&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;We need to implement the copy assignment operator to
free &lt;code&gt;name&lt;/code&gt; in the target and then make a deep copy
from the source.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In C++ circles, people talk about the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Rule_of_three_(C%2B%2B_programming)&amp;amp;oldid=1270656304&quot;&gt;rule of three&lt;/a&gt; which says:
that if you define any one of these then you probably
should define all three. In modern C++, people talk
about the &amp;quot;rule of five&amp;quot; which also includes the
move constructor and the move assignment operator.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;moving-on&quot;&gt;Moving On &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#moving-on&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; The feature I am about to describe was introduced
in C++ comparatively late (by which I mean in the past 15 years)
and I haven&#39;t really worked with it,
so I&#39;m writing based on what I&#39;ve read online. Don&#39;t write code
based on this section (or really, on the rest of this post either).&lt;/p&gt;
&lt;p&gt;C++-11 introduced the concept of &lt;em&gt;moving&lt;/em&gt; on assignment rather
than copying. Consider the following somewhat contrived code.&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  Rectangle &lt;span class=&quot;token function&quot;&gt;r1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  Rectangle &lt;span class=&quot;token function&quot;&gt;r2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Rectangle 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  TwoRectangles &lt;span class=&quot;token function&quot;&gt;two&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r1&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r2&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// r1 and r2 aren&#39;t used after this point.&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  TwoRectangles&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;do_stuff&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  TwoRectangles&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;do_other_stuff&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// r1, r2, and two are all destroyed here.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Under normal circumstances, transferring &lt;code&gt;r1&lt;/code&gt; and &lt;code&gt;r2&lt;/code&gt; into &lt;code&gt;two&lt;/code&gt;
would involve calling the &lt;code&gt;Rectangle&lt;/code&gt; copy constructor to copy
them into &lt;code&gt;two&lt;/code&gt;. &lt;code&gt;r1&lt;/code&gt; and &lt;code&gt;r2&lt;/code&gt; aren&#39;t used after this point
but just hang around until they go out of scope at the end
of the function, where they are destroyed, at the same time
as &lt;code&gt;two&lt;/code&gt;. This isn&#39;t a correctness issue because we eventually
clean up, but is wasteful because we copy them unnecessarily
(including allocating new memory to copy &lt;code&gt;name&lt;/code&gt;)
even though they&#39;re only used via &lt;code&gt;two&lt;/code&gt; thereafter.&lt;/p&gt;
&lt;p&gt;In modern C++ you can instead &lt;em&gt;move&lt;/em&gt; &lt;code&gt;r1&lt;/code&gt; and &lt;code&gt;r2&lt;/code&gt; into &lt;code&gt;two&lt;/code&gt;.
The details are complicated, but the high order idea is that
the source of the move isn&#39;t required to continue to be usable
and so you can make move more efficient than copying, in
this case by just coping the pointer to &lt;code&gt;name&lt;/code&gt; rather than
allocating new memory; you just copy &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt;
as usual. The source is left in an &amp;quot;unspecified but valid
state&amp;quot;, which seems to leave a lot of room for implementation
discretion.&lt;/p&gt;
&lt;p&gt;For obvious reasons you can&#39;t just go moving stuff around
any time someone assigns one variable to another, as before
move was introduced in C++-11 they would have been copied and it would be very surprising
to have the source variable suddenly become unusable. There
are some &lt;a href=&quot;https://stackoverflow.com/questions/9779079/why-does-c11-have-implicit-moves-for-value-parameters-but-not-for-rvalue-para&quot;&gt;specific circumstances&lt;/a&gt;
where the compiler will do a move automatically, but otherwise you have to
tell it you want a move by wrapping the source in a
&lt;code&gt;std::move()&lt;/code&gt; wrapper, like so:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fn17&quot; id=&quot;fnref17&quot;&gt;[17]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;foo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;move&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bar&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Importantly, nothing stops you from using the source object after
moving it, so in this case you could use &lt;code&gt;bar&lt;/code&gt;, but with unpredictable
results. You probably don&#39;t want to do this, because, as noted above,
it is left in a &amp;quot;valid but unspecified state&amp;quot;, but the compiler assumes
you know what you&#39;re doing (in a future post we&#39;ll look at Rust, where
using a value after a move is explicitly forbidden and the
compiler will stop you).&lt;/p&gt;
&lt;h3 id=&quot;internal-references&quot;&gt;Internal References &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#internal-references&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In many cases you can implement move with a shallow copy by just
copying the fields, because we don&#39;t need the original version to be
valid.  A shallow copy is obviously more efficient, but there are some
situations where it doesn&#39;t work. One common example is when the
object contains an internal reference. Consider the following
example:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Internal&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; ap&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;Internal&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    a &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    ap &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now &lt;code&gt;ap&lt;/code&gt; is a pointer to the internal field &lt;code&gt;a&lt;/code&gt;.
This is obviously a contrived example, but there are real situations
where it makes sense.&lt;/p&gt;
&lt;p&gt;The result is that if you were to just assign
the fields of one &lt;code&gt;Internal&lt;/code&gt; to another, then &lt;code&gt;ap&lt;/code&gt; will end up
pointing to the field &lt;code&gt;a&lt;/code&gt; in the &lt;em&gt;original&lt;/em&gt; object not the &lt;em&gt;new&lt;/em&gt; one.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/after-move.png&quot; alt=&quot;Incorrect Move&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
A shallow copy of an object with an internal pointer
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If the original is destroyed, &lt;code&gt;ap&lt;/code&gt; points to free memory, which
brings us back to use-after-free problems. Obviously, if you are
using this kind of class you will need to provide a smarter move
assignment implementation; the point is just that you need to do
that.&lt;/p&gt;
&lt;p&gt;C++ is full of this kind of situation, where the compiler
allows things that are unwise or even dangerous and you&#39;re just
supposed to know to not do them. To a great extent this is a
result of the way C++ developed: it used to be that these
were the only way to do things and so they&#39;re allowed even
though we have better ways now. When we get to Rust we&#39;ll
see that it just doesn&#39;t let you do dangerous stuff—unless
you ask it very nicely—because
it was designed from the ground up to be safe.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-smart-pointers&quot;&gt;Next Up: Smart Pointers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-2/#next-up%3A-smart-pointers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;RAII is a powerful technique but what we&#39;ve seen so far is only
a partial solution. Things are (mostly) fine when working with
objects but if we want to work with pointers, as in our &lt;code&gt;Rectangle&lt;/code&gt;
example, then we need to implement custom copy constructors,
copy assignment operators, etc. if we want them to be safe.
This is true even if we want to store on object on the heap
but have a pointer on the stack. In the next post I&#39;ll
be covering a technique called &amp;quot;smart pointers&amp;quot; that helps
address these problems.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;ve hopefully learned my lesson about not committing ahead of
time to the length. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt; In fact, the C
program we showed in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;part I&lt;/a&gt; will almost
compile, except that in C, you can implicitly cast from &lt;code&gt;void *&lt;/code&gt; to
any pointer type &lt;code&gt;T *&lt;/code&gt;, whereas in C++ you cannot, so you would need
to cast the return value of &lt;code&gt;malloc()&lt;/code&gt; and &lt;code&gt;realloc()&lt;/code&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
At least as long as the assignments are of the same type.
If you try to assign two values of different types, such
as a signed to an unsigned integer , then
C may try to convert them, but they still will end up
as discrete values. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
What&#39;s going on here is in C++ data and methods are
&amp;quot;private&amp;quot; by default, which means they
can&#39;t be accessed from outside the
class. The &lt;code&gt;public:&lt;/code&gt; line says to allow access. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that in some languages, such as Python or Rust,
you explicitly have to reference member variables
with something like &lt;code&gt;self.width&lt;/code&gt;, but that&#39;s not
how C++ works. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s possible to provide a default implementation that derived classes
can override. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
A virtual function is one which is associated with the
type of object rather than on the type of the pointer pointing to
it. This is what allows us to have a &lt;code&gt;Shape *&lt;/code&gt; where
&lt;code&gt;Rectangle&lt;/code&gt; and &lt;code&gt;Circle&lt;/code&gt; have different behavior.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that I had to name the function arguments &lt;code&gt;w&lt;/code&gt; and &lt;code&gt;h&lt;/code&gt; because
in C++ the bare &lt;code&gt;width&lt;/code&gt; means &amp;quot;the member of the object with the name &lt;code&gt;width&lt;/code&gt;&amp;quot;. This is one reason why some other languages explicitly require you to
specify &lt;code&gt;self.&lt;/code&gt; or &lt;code&gt;this-&amp;gt;&lt;/code&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Do not attempt to mix &lt;code&gt;malloc()/free()&lt;/code&gt; with &lt;code&gt;new/delete&lt;/code&gt;.
Who knows what will happen, but it&#39;s probably not good. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technically, it raises an exception which you could catch, but if
you don&#39;t the program crashes. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Don&#39;t hate me for not using initialization syntax. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Common practice in working with classes would actually
be to make these fields &amp;quot;private&amp;quot; so they couldn&#39;t
be accessed by the rest of the code, but that&#39;s not
necessary for the point I&#39;m trying to make here.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Incidentally, my original code was &lt;code&gt;main()&lt;/code&gt; and used &lt;code&gt;exit()&lt;/code&gt;, but
&lt;code&gt;exit()&lt;/code&gt; turns out not to fire the destructor, because it never
returns; the program just terminates. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are, however, some rules about what&#39;s safe to do. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn15&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The alternative is that it &lt;code&gt;this.name&lt;/code&gt; is assigned to &lt;code&gt;nullptr&lt;/code&gt;,
meaning that there is nothing there, but &lt;code&gt;free()&lt;/code&gt; handles this
case correctly. We don&#39;t need to handle the case because
it can&#39;t happen in a correctly constructed object. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref15&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn16&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;After I convinced it that I wanted the compiler
to do it rather than do it myself. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref16&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn17&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Don&#39;t ask what this does;
you&#39;re better off &lt;a href=&quot;https://en.cppreference.com/w/cpp/language/value_category#rvalue&quot;&gt;not knowing&lt;/a&gt; about &amp;quot;rvalues&amp;quot;, &amp;quot;lvalues&amp;quot;, and &amp;quot;xvalues&amp;quot; &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-2/#fnref17&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding Memory Management, Part 1: C</title>
		<link href="https://educatedguesswork.org/posts/memory-management-1/"/>
		<updated>2025-01-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/memory-management-1/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;UPDATED: 2025-02-15: Fixed some bugs in the examples and
pointed out that you don&#39;t usually just want to panic
on memory allocation failure.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/userust.jpg&quot; alt=&quot;Cover image&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I&#39;ve been writing a lot of &lt;a href=&quot;https://www.rust-lang.org/&quot;&gt;Rust&lt;/a&gt;
recently, and as anyone who has learned Rust can tell you, a huge part
of the process of learning Rust is learning to work within its
restrictive memory model, which forbids many operations that would be
perfectly legal in either a systems programming language like C/C++ or
a more dynamic language like Python or JavaScript. That got me thinking
about what was really happening and what invariants Rust was
trying to enforce.&lt;/p&gt;
&lt;p&gt;In this series, I&#39;ll be walking through the logic of memory management
in software systems, starting with the simple memory management in C and then
working up to more complicated systems. This series isn&#39;t intended
to be a tutorial on how to write C, Rust, or any other language; rather
the idea is to look at how things actually work under the hood
at a level that we usually ignore when all we are doing is trying
to write code.&lt;/p&gt;
&lt;h2 id=&quot;how-programs-use-memory&quot;&gt;How Programs Use Memory &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#how-programs-use-memory&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Consider the following program, written in Python&lt;/p&gt;
&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;all_lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;f &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;input.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; l &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; f&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;br /&gt;    all_lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;append&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;strip&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;all_lines&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sort&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; l &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; all_lines&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This program does something very simple, namely, it reads input
from the file one line at the time, then sorts the lines, and
prints out the lines in sorted order. So if &lt;code&gt;input.txt&lt;/code&gt; has
the following contents:&lt;/p&gt;
&lt;pre class=&quot;language-txt&quot;&gt;&lt;code class=&quot;language-txt&quot;&gt;jim&lt;br /&gt;bob&lt;br /&gt;deb&lt;br /&gt;carol&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output will be:&lt;/p&gt;
&lt;pre class=&quot;language-txt&quot;&gt;&lt;code class=&quot;language-txt&quot;&gt;bob&lt;br /&gt;carol&lt;br /&gt;deb&lt;br /&gt;jim&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, there is something
complicated hiding under the hood: because we don&#39;t know the
sort order of the lines in advance, we have to store all of the
lines we&#39;ve read until we know that we&#39;ve seen all of them, and
only then can we write them in sorted order.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
If we&#39;re going to store all the lines, they have to go somewhere,
which is in the computer&#39;s memory.&lt;/p&gt;
&lt;h3 id=&quot;storing-a-list&quot;&gt;Storing a List &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#storing-a-list&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Conceptually, a computer&#39;s memory is just a giant table of values,
with each value having an address. For convenience, let&#39;s assume
that entries are numbered from 0 and each entry can hold a single
character. Thus, if we want to store the string &amp;quot;computation&amp;quot;, we end
up with something like:&lt;/p&gt;
&lt;div style=&quot;  width: 300px;&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Memory Address&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;c&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;o&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;m&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;p&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;4&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;u&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;t&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;7&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;t&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;8&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;i&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;9&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;o&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;n&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;Or, in a more compact notation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Starting Address
0           | c | o | m | p | u | t | a | t | i | o |
10          | n |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The way to read this is that each cell is a memory location and on
the left we have the starting address for each row, so the &lt;code&gt;p&lt;/code&gt; is at
address 3 (the fourth column in the first row).&lt;/p&gt;
&lt;p&gt;With this in mind, how do we store the data from this file into
memory. The obvious way is something like this, which immediately
reveals that we have a problem:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0           | j | i | m | b | o | b | d | e | b | c |
10          | a | r | o | l |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we just concatenate the values in memory, how do we know where one
line ends and the next begins? For instance, maybe the first two
names are &amp;quot;jim&amp;quot; and &amp;quot;bob&amp;quot; or maybe it&#39;s one person named &amp;quot;jimbob&amp;quot;,
or even two people named &amp;quot;jimbo&amp;quot; and &amp;quot;b&amp;quot;. Obviously, we need some
way to keep track of the memory regions associated with individual values.&lt;/p&gt;
&lt;p&gt;There are a number of alternatives here, but let&#39;s just do something
obvious, which is to prefix every value with its length, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0           | 3 | j | i | m | 3 | b | o | b | 3 | d |
10          | e | b | 5 | c | a | r | o | l |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you know that an entry starts at address X, then you can print out
that entry in the obvious way:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;address = X
length = *X
address = address + 1
while length &amp;gt; 0 {
    address = address + 1
    print *address
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For non-C programmers, the notation &lt;code&gt;*X&lt;/code&gt; means &amp;quot;take the value at memory address X&amp;quot; (technical
term: &lt;em&gt;dereferencing&lt;/em&gt; &lt;code&gt;X&lt;/code&gt;) so the second
line is setting &lt;code&gt;length&lt;/code&gt; to be whatever is in X and the 5th line is
printing whatever is currently stored at &lt;code&gt;address&lt;/code&gt;. So, what happens here
is that we first read the length of the current line out of &lt;code&gt;address&lt;/code&gt;, then count
down one character at a time.&lt;/p&gt;
&lt;p&gt;So far so good, but what if we want to print out the whole list? The obvious
thing to do is just to repeat the process above, but now we have a new
problem, which is knowing when to stop. Remember that the memory is a giant
table and we&#39;re just showing the relevant portion of it. In reality,
we have:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0           | 3 | j | i | m | 3 | b | o | b | 3 | d |
10          | e | b | 5 | c | a | r | o | l | O | T |
20          | H | E | R |   |   | S | T | U | F | F |
30          | . | . | . |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is just a new version of the same problem, which is we don&#39;t want
want to read off the end of the list. This requires knowing where does our list
end and the other stuff in memory begins. One obvious thing to do is to prefix
the list with the amount of memory that it consumes, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0           | 19| 3 | j | i | m | 3 | b | o | b | 3 |
10          | d | e | b | 5 | c | a | r | o | l |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can write a program to go over the whole list, like so:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;total_length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;X&lt;br /&gt;address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; address &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; total_length &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;X&lt;br /&gt;    address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; address &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;    total_length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; total_length &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; length &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; address &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;        length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; length &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;        print &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;address&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that this is a &lt;em&gt;self-contained&lt;/em&gt; object. As long as you know where it
starts and what type it is (i.e., a list of strings), then you can read
it out knowing just the starting address &lt;code&gt;X&lt;/code&gt;. I.e., we can have a
subroutine/function, like so:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;function &lt;span class=&quot;token function&quot;&gt;print_list&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;X&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    total_length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;X&lt;br /&gt;    address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; address &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; total_length &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;X&lt;br /&gt;        address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; address &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;        total_length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; total_length &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; length &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; address &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;            length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; length &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;            &lt;br /&gt;            print &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;address&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, we &lt;em&gt;do&lt;/em&gt; have to remember where the object starts, as it&#39;s
not going to always start at &lt;code&gt;0&lt;/code&gt; (what if we have two lists, or a
list and something else?). So how do we do that?&lt;/p&gt;
&lt;h3 id=&quot;the-stack&quot;&gt;The Stack &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#the-stack&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Up till now I&#39;ve been acting like memory is just an undifferentiated
table, but the reality is much more complicated.
Although from a hardware perspective the memory is largely undifferentiated
there is a conventional way to lay things out, as shown in this
diagram I borrowed from Geeksforgeeks:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdncontribute.geeksforgeeks.org/wp-content/uploads/memoryLayoutC.jpg&quot; alt=&quot;C memory architecture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;To orient yourself, address zero is at the bottom of the diagram
and higher addresses are at the top. The program is actually
split up into two pieces: the program itself (&amp;quot;the &lt;em&gt;text&lt;/em&gt; segment&amp;quot;)
There are also two different parts of memory where the program&#39;s
data is stored call the &amp;quot;stack&amp;quot; and the &amp;quot;heap&amp;quot;. Very roughly speaking,
they are used like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The &lt;strong&gt;stack&lt;/strong&gt; is used to store fixed-size data that is part of the
local context of the function.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;strong&gt;heap&lt;/strong&gt; is used to store arbitrary-sized data or data that
survives past the lifetime of a function.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For instance, in our &lt;code&gt;print_list()&lt;/code&gt; function above, &lt;code&gt;total_length&lt;/code&gt;, &lt;code&gt;address&lt;/code&gt;,
and &lt;code&gt;length&lt;/code&gt; are fixed size values (effectively integers big enough to hold
a memory address), so they can be allocated on the stack. By contrast,
the list of strings is arbitrary sized and in fact of a size that&#39;s
dependent on the file we are reading in, and so is allocated on the heap.&lt;/p&gt;
&lt;p&gt;When we call a function in a compiled language like C (or Rust), the
compiler makes sure you have enough space on the stack to store all
the variables it needs and makes room for it in memory. This is called
a &amp;quot;stack frame&amp;quot;. So, &lt;code&gt;print_list()&lt;/code&gt; would have a stack frame big
enough to store all three of these values just laid out end to end,
like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| total_length  |   address     |   length      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Importantly, the layout here is fixed (and dictated by the compiler)
and so we don&#39;t need to have any metadata telling us how long things
are; the compiler just knows.&lt;/p&gt;
&lt;p&gt;The stack is laid out &lt;em&gt;contiguously&lt;/em&gt; in memory, with each function
call extending the stack by enough room for the stack frame associated
with that function, which depends on which function it is.
For technical reasons, the stack grows &amp;quot;downward&amp;quot; in memory towards
smaller addresses, so the callee has a lower address than the caller.
The following figure shows a simple example.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/function-call-stack.png&quot; alt=&quot;The stack for a simple function call&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The stack before, during, and after simple function call
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;At the left of the figure, we see the situation where we are in
the function &lt;code&gt;f()&lt;/code&gt;. The stack just consists of the stack from
for &lt;code&gt;f()&lt;/code&gt;. If &lt;code&gt;f()&lt;/code&gt; calls &lt;code&gt;g()&lt;/code&gt; then we add a new stack frame
for &lt;code&gt;g()&lt;/code&gt; (technical term: &lt;em&gt;pushing&lt;/em&gt; onto the stack), shown in the middle of the figure. Then when &lt;code&gt;g()&lt;/code&gt;
returns, the stack shrinks (technical term: &lt;em&gt;popping&lt;/em&gt; the stack), leaving
us back where we were before. Note that if &lt;code&gt;f()&lt;/code&gt; called a different
function &lt;code&gt;h()&lt;/code&gt;, it would end up where the &lt;code&gt;g()&lt;/code&gt; frame was before,
but might be of different size, depending on how many local
variables it had.&lt;/p&gt;
&lt;p&gt;You should now be able to see why the stack isn&#39;t suitable for
variable-sized objects: we need to allocate the stack frame when
the function is called, and we can&#39;t do that if we don&#39;t know how
big the variables in the stack will be. It&#39;s possible to grow stack
frames but not convenient, so instead, we need to
allocate them somewhere else, which is what the heap is used for.&lt;/p&gt;
&lt;h3 id=&quot;the-heap&quot;&gt;The Heap &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#the-heap&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Conceptually the heap is just a big pile of memory that we can allocate
space out of. In many languages (e.g., Python or JavaScript)
this is done automatically when you make an object, but in C,
you have to do memory management by hand. This is done with the &lt;code&gt;malloc()&lt;/code&gt; API, which is
used like this:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;space &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This works exactly like you would expect, namely reserving a block of
100 bytes on the heap that the caller can then use however they want.
The return value from &lt;code&gt;malloc()&lt;/code&gt; is the memory address of the allocated
region. Internally, of course, &lt;code&gt;malloc()&lt;/code&gt; has to do some bookkeeping
to know which memory is in use and which is not. There are a large number
of different data structures that can be used here, but essentially any
technique will involve using some of the heap for that bookkeeping,
leaving the rest available for allocation.&lt;/p&gt;
&lt;h2 id=&quot;memory-management-in-c&quot;&gt;Memory Management in C &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#memory-management-in-c&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With that background, let&#39;s try rewriting our program in C, where
we have to do the memory management by hand. This gets a lot more
complicated, so let&#39;s take it in pieces.&lt;/p&gt;
&lt;h3 id=&quot;read-in-the-file.&quot;&gt;Read in the file. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#read-in-the-file.&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;First, we have to read in the file.&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;  FILE &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;fp &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fopen&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;input.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;r&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; line&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; num_lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// 1. Read in the file.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;l &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fgets&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// End of file (hopefully).&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;// Make room in the list of lines.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;realloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;num_lines &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// We are out of memory so panic.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;copy &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;copy&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// We are out of memory so panic.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; copy&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    num_lines&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We start by opening the input file, in this case &lt;code&gt;input.txt&lt;/code&gt;. Then,
as before, we&#39;re going to iterate over the lines in the file and
add them to our list of stored lines. This is accomplished by our
&lt;code&gt;while&lt;/code&gt; loop.&lt;/p&gt;
&lt;p&gt;We can read the line in using the &lt;code&gt;fgets()&lt;/code&gt; function, which
reads a line (defined by ending in a &lt;code&gt;&#92;n&lt;/code&gt; newline character)
out of the file &lt;code&gt;fp&lt;/code&gt; into the buffer (memory region) associated with &lt;code&gt;line&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; line&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// 1. Read in the file.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;l &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fgets&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// End of file (hopefully).&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;what&#39;s-a-buffer%3F&quot;&gt;What&#39;s a buffer? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#what&#39;s-a-buffer%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;For those of you who haven&#39;t heard the term before, a &lt;em&gt;buffer&lt;/em&gt; is just
programmer jargon for some piece of storage used to hold data
temporarily, as in this case case where we&#39;re reading in a line
of data and then quickly doing something with it. It&#39;s also
the name for this doodad which is responsible for stopping trains
which don&#39;t stop on their own at the end of the track.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Airtrain_Domestic_stn_end_of_railway.jpg&quot; alt=&quot;A buffer&quot; /&gt;&lt;/p&gt;
&lt;p&gt;From Wikipedia by &lt;a href=&quot;https://commons.wikimedia.org/wiki/User:Orderinchaos&quot; title=&quot;User:Orderinchaos&quot;&gt;User:Orderinchaos&lt;/a&gt; - &lt;span class=&quot;int-own-work&quot; lang=&quot;en&quot;&gt;Own work&lt;/span&gt;, &lt;a href=&quot;https://creativecommons.org/licenses/by-sa/3.0&quot; title=&quot;Creative Commons Attribution-Share Alike 3.0&quot;&gt;CC BY-SA 3.0&lt;/a&gt;, &lt;a href=&quot;https://commons.wikimedia.org/w/index.php?curid=34243713&quot;&gt;Link&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There&#39;s already something sus here, though. Did you notice the line &lt;code&gt;char line[1024]&lt;/code&gt;?
This is C notation for &amp;quot;make a buffer called line which is long enough to hold 1024 characters&amp;quot;.
As noted before, &lt;code&gt;line&lt;/code&gt; has to be fixed size and 1024 is just an arbitrary
number that&#39;s hopefully large enough to hold any line in the file.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
But what if
one of the lines is longer? In that case, &lt;code&gt;fgets()&lt;/code&gt; will break up the line into
two pieces, causing the program to be incorrect. The right way to do this would
actually be to keep reading until we had a full line, but this would make
the program even more complicated, so we&#39;ll just live with the defect, seeing
as it&#39;s an example program.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;fgets()&lt;/code&gt; returns a pointer to the input buffer if successful and a zero value
(&lt;code&gt;NULL&lt;/code&gt;) at the end of the file (or any error, actually), so when we test
for &lt;code&gt;l&lt;/code&gt;, we are actually testing for the end of the file, at which point the
loop exits.&lt;/p&gt;
&lt;p&gt;At this point, we have the next line of the file in &lt;code&gt;line&lt;/code&gt;, but we overwrite
that buffer every time we read a new line from the file, so we need to store it
somewhere. We use the &lt;code&gt;lines&lt;/code&gt; variable for this, which is defined as:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; num_lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This notation can be a bit hard to read for non C programmers, but briefly &lt;code&gt;*&lt;/code&gt; means
that something is a &lt;em&gt;pointer&lt;/em&gt;, which is to say that it&#39;s something that holds a
memory address. &lt;code&gt;**&lt;/code&gt; means that it&#39;s a pointer to a pointer, which is to say that
&lt;code&gt;lines&lt;/code&gt; holds the address of a block of memory that is itself full of values
that themselves are memory addresses, in this case the individual stored lines.
&lt;code&gt;num_lines&lt;/code&gt; stores the number of lines that we have in memory.&lt;/p&gt;
&lt;p&gt;We can see this in action by looking at the next block of code:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;    &lt;span class=&quot;token comment&quot;&gt;// Make room in the list of lines.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;realloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;num_lines &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// We are out of memory so panic.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;copy &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;copy&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// We are out of memory so panic.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; copy&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    num_lines&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Storing a copy of &lt;code&gt;line&lt;/code&gt; is a two-part process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Make a copy of the line itself.&lt;/li&gt;
&lt;li&gt;Store the pointer to that line in &lt;code&gt;lines&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, in order to store that pointer, we first need to make room in
&lt;code&gt;lines&lt;/code&gt;, which means allocating some memory. This happens on the two lines
at the start of this snippet. There are actually two cases here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Lines is empty (nothing is stored), which happens at the start.&lt;/li&gt;
&lt;li&gt;Lines is non-empty but doesn&#39;t have enough room.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We distinguish these by looking at &lt;code&gt;num_lines&lt;/code&gt; which starts at &lt;code&gt;0&lt;/code&gt;.
In the former case, we allocate enough memory for a single line,
like so:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;      lines &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This says &amp;quot;make enough room to hold the address of a single string&amp;quot;,
and is nothing we haven&#39;t seen before.&lt;/p&gt;
&lt;p&gt;The latter case is more complicated, however, because we already
have something in &lt;code&gt;lines&lt;/code&gt;, it&#39;s just that there&#39;s not (necessarily)
enough room in memory to add another value. This means we (may) need to&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Allocate enough memory to hold the new number of values.&lt;/li&gt;
&lt;li&gt;Copy over the current contents of &lt;code&gt;lines&lt;/code&gt; into the new
memory region.&lt;/li&gt;
&lt;li&gt;De-allocate the original memory.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What are all the parentheticals doing here? The answer is that
the block of memory pointed to by &lt;code&gt;lines&lt;/code&gt; may already be big
enough. When you call &lt;code&gt;malloc(size)&lt;/code&gt; the system guarantees that
the returned pointer is &lt;em&gt;at least&lt;/em&gt; big enough to hold an object
of size &lt;code&gt;size&lt;/code&gt;—assuming that the allocation succeeds—but it&#39;s
allowed to be larger. This could happen for a number of reasons
(see &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#how-malloc-works&quot;&gt;How Malloc Works&lt;/a&gt; below for some more
background), but one of which is to facilitate exactly this
case: if people want to resize an object frequently, as we are doing
here, then it&#39;s not efficient to have to copy the contents of
the object over and over again. Instead, you can allocate more
space than the programmer asked for and then when they ask for
more, just say &amp;quot;ok&amp;quot; without taking any other action.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
All of this is handled automatically by the &lt;code&gt;realloc()&lt;/code&gt; function call.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
At the end of this process &lt;code&gt;lines&lt;/code&gt; may or may not have the same
value from what you passed into &lt;code&gt;realloc&lt;/code&gt;. What
matters, though, is that the memory that &lt;code&gt;lines&lt;/code&gt; points to has
the same contents as before, and that&#39;s what &lt;code&gt;realloc()&lt;/code&gt; guarantees.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible that the memory allocation will fail, for instance
if the computer is out of memory. In that case, &lt;code&gt;lines&lt;/code&gt; will be set to
&lt;code&gt;NULL&lt;/code&gt; and we need to abort:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    if (!lines) {
      abort(); // We are out of memory so panic.
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&#39;m just calling &lt;code&gt;abort()&lt;/code&gt; which makes the program crash, but
you could do something more sophisticated here, such as having
the function fail and let some higher level handle it, either
via an orderly shutdown of the program or actually trying
to recover enough memory to let the program survive. The
right thing to do here was fairly hotly contested on the
Hacker News thread for this post, but in my experience most
programs crash. &lt;em&gt;[2025-02-15 -- added]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Now that we have room in &lt;code&gt;lines&lt;/code&gt; we can store a copy of the actual
line we&#39;ve read in, but first we have to make a new buffer to store
it in (remember that &lt;code&gt;line&lt;/code&gt; will be overwritten). We can do that with
the &lt;code&gt;strdup()&lt;/code&gt; function call, which makes a copy of a string,
allocating new memory as needed:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;copy &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;copy&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// We are out of memory so panic.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here too, we can run out of memory, so we need to check for &lt;code&gt;copy&lt;/code&gt; being
&lt;code&gt;NULL&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Finally, we can append the copied line to the end of &lt;code&gt;lines&lt;/code&gt; and increment
the number of stored lines:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;copy &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;copy&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// We are out of memory so panic.&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The diagram below should help provide an understanding of the data structures
here:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/lines-buffer-unsorted.png&quot; alt=&quot;Stored lines in C&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Data structure for stored lines.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;On the left we have the &lt;code&gt;lines&lt;/code&gt; variable itself, which is stored somewhere on the
stack. It contains the address of the memory we have allocated to store the
list of lines, namely address &lt;code&gt;1024&lt;/code&gt;. That memory region is shown in the middle
of the diagram. Finally, on the right we have the individual regions for each
stored line. The memory region for &lt;code&gt;lines&lt;/code&gt; stores their addresses, each laid
out one after the other. Note that there&#39;s no variable on the stack which
points to these regions, they&#39;re just pointed to by the addresses stored in
the region pointed to by &lt;code&gt;lines&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;sorting-the-lines&quot;&gt;Sorting the Lines &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#sorting-the-lines&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The next thing we do is to sort the lines. This is done by the &lt;code&gt;qsort()&lt;/code&gt; library
function.&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;  &lt;span class=&quot;token comment&quot;&gt;// 2. Sort the lines.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;qsort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; num_lines&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; compare_string&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;qsort()&lt;/code&gt; is kind of hard to use because it&#39;s designed to sort a list of
any kind of object of any size. This means that you have to pass:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The pointer (address) to the first object in the list (in this case &lt;code&gt;lines&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The number of objects in the list (&lt;code&gt;num_lines&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;size&lt;/em&gt; of the objects &lt;code&gt;sizeof(char *)&lt;/code&gt;. In this case, that&#39;s the size
of a pointer to a string.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;A comparison function which tells &lt;code&gt;qsort()&lt;/code&gt; the sort order for two objects.
I&#39;m going to skip over the details here.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;qsort()&lt;/code&gt; sorts its arguments in place, so this means that after it&#39;s
done, &lt;code&gt;lines&lt;/code&gt; is sorted, like so:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/lines-buffer-sorted.png&quot; alt=&quot;Stored lines in C (sorted)&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Data structure for sorted stored lines.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Importantly, the only thing that&#39;s changed here is the values in
the middle memory region, referenced by &lt;code&gt;lines&lt;/code&gt;. The actual strings
are unchanged and are in the same memory regions; we&#39;ve just
rearranged the pointers in &lt;code&gt;lines&lt;/code&gt; to point to the strings in
the right order. Note that that order is not given by the
numeric order but rather by the lexical order of the strings.
That&#39;s convenient, but actually required, because
&lt;code&gt;qsort()&lt;/code&gt; doesn&#39;t know anything about the
objects it&#39;s sorting; it just knows how to pass them to the comparison
function, so all it can do is manipulate its own data as if the objects
were numbers, whatever their actual semantics.&lt;/p&gt;
&lt;h3 id=&quot;printing-the-results&quot;&gt;Printing the Results &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#printing-the-results&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;After all this, we&#39;re ready to print the results. This is comparatively
simple, just iterating over the entries in &lt;code&gt;lines&lt;/code&gt; and printing the
corresponding strings:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;  &lt;span class=&quot;token comment&quot;&gt;// 3. Print the lines.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;fputs&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;stdout&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One thing you may be wondering about here is how we know how long each
line is, as we haven&#39;t stored any length information. The convention in C is actually to have each string end with
a byte with the value of &lt;code&gt;0&lt;/code&gt;, usually written as &lt;code&gt;&#92;0&lt;/code&gt;. This is
pretty universally agreed to have been a bad idea, but we&#39;re taking
advantage of it here because it means that the strings are self-contained.&lt;/p&gt;
&lt;h3 id=&quot;cleaning-up&quot;&gt;Cleaning Up &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#cleaning-up&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Finally, we want to clean up. In this simple program, that&#39;s not really required
because when a program terminates the operating system automatically reclaims
its resources, including memory, but this function might be used by
some bigger program, in which case we&#39;d want to reclaim the memory
used by the list of strings, as well as close the open file (remember
&lt;code&gt;input.txt&lt;/code&gt;?).&lt;/p&gt;
&lt;p&gt;If this function were to return without cleaning up, it would create
what&#39;s called a &amp;quot;memory leak&amp;quot;. Remember that the only variable in
our program that knows about any of this memory is &lt;code&gt;lines&lt;/code&gt;, which
points to the list of pointers for the individual stored lines. &lt;code&gt;lines&lt;/code&gt;
is on the stack and will be lost when the function returns,
so if the function returns without cleaning up, then there is no
program variable pointing to any of this memory and it&#39;s just lost.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
The result of a memory leak is that the leaked memory isn&#39;t available
for new allocations but also can&#39;t be used because there&#39;s nothing
pointing to it. If the program runs
long enough and has a big enough leak, you can eventually accumulate
enough leaked memory to affect the program function or even cause it
to run out of memory, so you want to clean up. This is one reason why
it often works to restart a program that seems stalled.&lt;/p&gt;
&lt;p&gt;In C, memory is freed using the &lt;code&gt;free()&lt;/code&gt; function, which takes the
pointer to be freed. Here&#39;s what the cleanup looks like:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;  &lt;span class=&quot;token comment&quot;&gt;// Clean up.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;fclose&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that we need to free the stored lines before we free &lt;code&gt;lines&lt;/code&gt;
because once we&#39;ve freed &lt;code&gt;lines&lt;/code&gt; nothing points to the stored
lines and so there&#39;s no way to free them (remember, we need
their addresses). This means we need to iterate through &lt;code&gt;lines&lt;/code&gt;
freeing each individual allocation and then only when we&#39;re
done freeing &lt;code&gt;lines&lt;/code&gt; itself. Recall that all the local variables
will just be deallocated when the function returns. However,
this doesn&#39;t mean that the things they point to are deallocated,
just that the storage used by the variable itself is reclaimed.&lt;/p&gt;
&lt;h3 id=&quot;error-handling&quot;&gt;Error Handling &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#error-handling&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because this is demonstration code, I&#39;ve chosen to ignore the
case where a line is longer than 1024 characters, but what if
we wanted to handle that instead? You can detect this case with
&lt;code&gt;fgets()&lt;/code&gt; by checking to see if you have a newline at the
end of the buffer&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;l &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fgets&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// End of file (hopefully).&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;strlen&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token char&quot;&gt;&#39;&#92;n&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; BAD_LINE_ERROR&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The problem with this code is that it has a memory leak:
if we have already read in some of the lines, then we&#39;ll
leak &lt;code&gt;lines&lt;/code&gt; and whatever lines we read in. In order to
avoid the leak, we need to run our cleanup routines.
Once common way to handle this is to have an &lt;code&gt;error&lt;/code&gt; block
that we execute. For instance:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;  &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; status &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; OK&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;l &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fgets&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;line&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// End of file (hopefully).&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;strlen&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;l&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token char&quot;&gt;&#39;&#92;n&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      status &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; BAD_LINE_ERROR&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;goto&lt;/span&gt; error&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;error&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Clean up.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;fclose&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; status&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;goto&lt;/code&gt; instruction here just says to go to the line labelled
&lt;code&gt;error&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This works but is error prone: you need to note every
case where the function might return and jump to the right
error block. Moreover, the error block needs to be able to
clean up after any kind of error, so, for instance, it
needs to be able to handle when &lt;code&gt;lines = NULL&lt;/code&gt; (fortunately,
&lt;code&gt;free()&lt;/code&gt; handles this case automatically). Finally, if
you forget to set the &lt;code&gt;status&lt;/code&gt; value, then you are incorrectly
returning an &lt;code&gt;OK&lt;/code&gt; status even if there was an error.&lt;/p&gt;
&lt;h3 id=&quot;locally-scoped-allocations&quot;&gt;Locally Scoped Allocations &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#locally-scoped-allocations&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You might ask why you can&#39;t just free &lt;em&gt;all&lt;/em&gt; the memory that was
allocated by a function when the function returns rather than just
the memory on the stack? Then we wouldn&#39;t have to do all of
this stuff where we explicitly free everything.&lt;/p&gt;
&lt;p&gt;There&#39;s an obvious answer to this question: some functions
&lt;em&gt;intentionally&lt;/em&gt; allocate memory and don&#39;t clean it up. An obvious
example here is the &lt;code&gt;strdup()&lt;/code&gt; function we used above. Internally,
strdup does something like this:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;strdup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;str&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;br /&gt;  &lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; len &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;strlen&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;str&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;retval &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;len&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;retval&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;retval&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; str&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; retval&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we automatically freed all memory that was allocated by the
function, then we would free &lt;code&gt;retval&lt;/code&gt; before returning, at which point
the caller would be left with a pointer to memory that has been freed,
which is clearly a problem. In fact, it&#39;s the source of a common
security vulnerability called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Dangling_pointer&amp;amp;oldid=1248879275#use_after_free&quot;&gt;use after
free&lt;/a&gt;.
Clearly we would need something more sophisticated than just
freeing everything that was allocated in the function when
it exits.&lt;/p&gt;
&lt;p&gt;What you actually want is the compiler to know when objects are
intended to outlive the function and when they are not, but actually
distinguishing these cases is very difficult in C, at least without
help from the programmer, as we will see in the rest of the series. In
fact, making this kind of analysis possible is one of the the main
motivating design choices for much of the Rust memory model.&lt;/p&gt;
&lt;h2 id=&quot;how-malloc()-works&quot;&gt;How &lt;code&gt;malloc()&lt;/code&gt; works &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#how-malloc()-works&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So far we&#39;ve just been treating &lt;code&gt;malloc()&lt;/code&gt; as a kind of black box,
and that&#39;s generally fine for most programming tasks, but it&#39;s
helpful to have some sense of what&#39;s going on internally. The first
thing to realize is that &lt;code&gt;malloc()&lt;/code&gt; isn&#39;t magic. In fact, you can
write your own memory allocator in C (Firefox, for instance, uses
a custom allocator).&lt;/p&gt;
&lt;p&gt;At a very high level, you should think of &lt;code&gt;malloc()&lt;/code&gt; as having
access to one or more large contiguous blocks of memory, which it
then dispenses on demand. On a very simple computer, &lt;code&gt;malloc()&lt;/code&gt; would
just have access to the entire memory of the machine, but on a
modern multiprocess operating system, it gets chunks of memory
from the operating system. For our purposes, let&#39;s easiest to
think of it as having a big contiguous chunk of memory to work
with. As I said, we usually wouldn&#39;t start at memory location 0,
so we&#39;ll just assume the block starts at 1000.&lt;/p&gt;
&lt;p&gt;The figure below shows the situation after a single allocation
of size 200, with the allocation being red and the unallocated
space being blue. What&#39;s happened here is just that &lt;code&gt;malloc(200)&lt;/code&gt;
just picked the first available memory region, which is
at the start of the block because no memory has been allocated.&lt;/p&gt;
&lt;style&gt;
*,
*:before,
*:after {
  box-sizing: border-box;
}

.container {
  width: 700px;
  border: 1px solid black;
}
.row {
  display: flex;
}
.item {
  border: 1px solid black;
  border-bottom: 0;
  border-right: 0;
  padding: 10px;
  flex-shrink: 0;
  background-color: lightblue
}
.row:first-child .item {
  border-top: 0;
}
.row .item:first-child {
  border-left: 0;
}
.used {
  background-color: red;
}

.header {
  background-color: blue;
}

.newsletter-warning {
  display: none;
}
&lt;/style&gt;
&lt;div class=&quot;newsletter-warning&quot;&gt;
&lt;p&gt;&lt;em&gt;Note: If the following diagram doesn&#39;t render properly, it&#39;s
probably because your mail reader doesn&#39;t allow inline styles.
Try reading the &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1&quot;&gt;Web version&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;figure&gt;
&lt;div class=&quot;container&quot;&gt;
   &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;1: 1000-1199&lt;/div&gt;
    &lt;div style=&quot;width:80%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:100%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
&lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;The allocation starts at address 1000 and goes to address 1199,
so &lt;code&gt;malloc()&lt;/code&gt; just returns the address &lt;code&gt;1000&lt;/code&gt;, which points
to the start of the allocated region.&lt;/p&gt;
&lt;p&gt;The next figure shows the situation with two more allocations, one of
size 400 and one of size 200. Again, this is what you&#39;d expect: the
allocator just picks the lowest available region.  As noted above, a
real allocator would probably leave some extra space to facilitate growing
the allocation but we&#39;re trying to keep things simple for the purpose
of examples. Designing fast memory allocators is a whole (complicated)
topic all on its own.&lt;/p&gt;
&lt;figure&gt;
&lt;div class=&quot;container&quot;&gt;
   &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;1: 1000-1199&lt;/div&gt;
    &lt;div style=&quot;width:40%;&quot; class=&quot;item used&quot;&gt;2: 1200-1599&lt;/div&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;3: 1600-1799&lt;/div&gt;    
    &lt;div style=&quot;width:20%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:100%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
&lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;So far so good, but now what happens when we free allocation #2?
The result is shown below.&lt;/p&gt;
&lt;figure&gt;
&lt;div class=&quot;container&quot;&gt;
   &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;1: 1000-1199&lt;/div&gt;
   &lt;div style=&quot;width:40%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;3: 1600-1799&lt;/div&gt;    
    &lt;div style=&quot;width:20%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:100%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
&lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;We have a 400 byte
sized hole of free memory. If we try to do another 200
byte allocation, it will work fine, like so:&lt;/p&gt;
&lt;figure&gt;
&lt;div class=&quot;container&quot;&gt;
   &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;1: 1000-1199&lt;/div&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;4: 1200-1399&lt;/div&gt;    
   &lt;div style=&quot;width:20%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;3: 1600-1799&lt;/div&gt;    
    &lt;div style=&quot;width:20%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:100%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
&lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;But if we now try to allocate another 400 bytes, it obviously won&#39;t
fit, so we need to go into higher memory.&lt;/p&gt;
&lt;figure&gt;
&lt;div class=&quot;container&quot;&gt;
   &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;1: 1000-1199&lt;/div&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;4: 1200-1399&lt;/div&gt;    
   &lt;div style=&quot;width:20%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;3: 1600-1799&lt;/div&gt;    
    &lt;div style=&quot;width:20%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:40%;&quot; class=&quot;item used&quot;&gt;4: 2000-2399&lt;/div&gt;      
    &lt;div style=&quot;width:60%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
&lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;As the program runs longer and memory is allocated and freed
you tend get lots of small holes that can&#39;t be filled with big
allocations, and so you have to allocate higher and higher
memory regions. This is called &lt;em&gt;fragmentation&lt;/em&gt;.
In the extreme, you can get to the point where
you can&#39;t allocate new memory even though there&#39;s actually
plenty of free space; it&#39;s just not in a convenient form.
There are techniques for avoiding this kind of
fragmentation as well as for allocating memory more efficiently,
but they&#39;re too advanced to cover here.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h5 id=&quot;how-does-malloc()-get-its-memory%3F&quot;&gt;How does &lt;code&gt;malloc()&lt;/code&gt; get its memory? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#how-does-malloc()-get-its-memory%3F&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;As I said above, &lt;code&gt;malloc()&lt;/code&gt; gets chunks of memory from the operating
system. Remember that your program has to share the computer, including
its memory, with other programs and the operating system is responsible
for arbitrating which program has which chunk of memory. Conceptually
this is actually somewhat like &lt;code&gt;malloc()&lt;/code&gt; except that &lt;code&gt;malloc()&lt;/code&gt;
calls some system API (for instance, &lt;a href=&quot;https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/mmap.2.html&quot;&gt;&lt;code&gt;mmap()&lt;/code&gt;&lt;/a&gt; to request memory from the operating system.&lt;/p&gt;
&lt;p&gt;Note that modern systems all have &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Virtual_memory&amp;amp;oldid=1263034421&quot;&gt;virtual
memory&lt;/a&gt;,
systems in which the system automatically moves (&amp;quot;swaps&amp;quot;) data in and
out of the physical memory and onto disk so that programs can
allocate more space than is in the hardware of the system. In order
to this, the operating system may have to move stuff around
in physical memory, so it maintains a mapping from the address
the programs use to the actual physical location in memory.
In a future post, I may cover virtual memory in more detail, but
no promises.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A natural question to ask is why we can&#39;t just move things around
to accommodate the holes? The reason is that the pointers stored
in the program literally just point to the memory addresses
where the allocations are stored. So if (for instance) we were
to slide allocation #3 over to the right to make room for
allocation #4, then whatever pointer was returned from the initial
&lt;code&gt;malloc()&lt;/code&gt; for #3 would now point somewhere in the middle of
allocation #4, which is obviously a problem. The compiler doesn&#39;t
keep track of the variables holding the pointers, so it has no
way to go back and readjust them.
The key thing to realize here is that all that &lt;code&gt;malloc()&lt;/code&gt; and &lt;code&gt;free()&lt;/code&gt;
are doing is &lt;strong&gt;bookkeeping&lt;/strong&gt;: the allocator remembers which memory is
currently in use and then hands out pointers to regions that aren&#39;t
currently in use as needed. This naturally raises the
question of how the allocator does the bookkeeping. How does it
remember which regions are in use and which aren&#39;t? The obvious answer
is the right one: the allocator reserves some of the memory it
has to work with for this kind of bookkeeping metadata, at minimum:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The size of each allocation (so it can be freed)&lt;/li&gt;
&lt;li&gt;The regions that are currently free, for instance the
top of the highest allocation and the addresses of the
the freed holes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When you call &lt;code&gt;malloc()&lt;/code&gt; the allocator finds a suitable region
and allocates it. When you call &lt;code&gt;free()&lt;/code&gt; it adds it to the list
of holes (or adjusts the highest allocation value if it&#39;s the
highest allocation).&lt;/p&gt;
&lt;p&gt;Interestingly, it&#39;s not always necessary to store a list
of every chunk of allocation memory. You can do this,
but that means you need some data structure that lets you
look up the allocations from their addresses. A common thing
people do instead is to store the per-allocation metadata
as a header right before the allocated region. The header
contains the size of the allocation and maybe some other stuff.
For instance, the first allocation above might look like this:&lt;/p&gt;
&lt;figure&gt;
&lt;div class=&quot;container&quot;&gt;
   &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:10%;&quot; class=&quot;item header&quot;&gt;Header&lt;/div&gt;
    &lt;div style=&quot;width:20%;&quot; class=&quot;item used&quot;&gt;1: 1100-1299&lt;/div&gt;
    &lt;div style=&quot;width:70%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
  &lt;div class=&quot;row&quot;&gt;
    &lt;div style=&quot;width:100%;&quot; class=&quot;item free&quot;&gt;Unallocated&lt;/div&gt;    
   &lt;/div&gt;
&lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;Instead of returning &lt;code&gt;1000&lt;/code&gt;, in this case &lt;code&gt;malloc()&lt;/code&gt; would return
&lt;code&gt;1100&lt;/code&gt;. (I&#39;ve drawn it as 100 to make the figure more readable,
but hopefully you&#39;re not wasting 100 bytes of overhead on every allocation.)
Then when you call &lt;code&gt;free(1100)&lt;/code&gt; the allocator would subtract
the size of the header and deallocate the whole region from
&lt;code&gt;1000-1299&lt;/code&gt;. The reason this works is that &lt;code&gt;free()&lt;/code&gt; requires
knowing the memory address anyway, so there&#39;s no need to
store it. If you call &lt;code&gt;free()&lt;/code&gt; on some region of
memory that wasn&#39;t returned from &lt;code&gt;malloc()&lt;/code&gt; the results are
likely to be disastrous, because &lt;code&gt;free()&lt;/code&gt; has no way of knowing
that this is a mistake and will just treat whatever is right
before the pointer you passed in as the header. If that
data is attacker controlled, it can easily lead to a vulnerability.&lt;/p&gt;
&lt;h2 id=&quot;multiple-references-and-uaf&quot;&gt;Multiple References and UAF &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#multiple-references-and-uaf&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Let&#39;s consider a slight modification of the function we&#39;ve been
looking at, in which along with printing out all the lines,
we instead return the last line in sort order.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
With
a lot of trimming, the function might look like this:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;find_smallest&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;filename&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// 2. Sort the lines.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;qsort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; num_lines&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; compare_string&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  largest &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;num_lines &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;br /&gt;  &lt;br /&gt;  &lt;span class=&quot;token comment&quot;&gt;// Clean up.&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;fclose&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;num_lines&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;lines&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; largest&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This function gets called like this:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;largest &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;find_largest&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;input.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%s&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; largest&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The experienced C programmer will immediately note that this
code has a serious bug, because we are trying to use the
memory pointed to &lt;code&gt;largest&lt;/code&gt; after we have &lt;code&gt;free()&lt;/code&gt;d it.
When the calling function tries to use &lt;code&gt;largest&lt;/code&gt;, there are
&lt;a href=&quot;https://en.cppreference.com/w/c/language/behavior&quot;&gt;&lt;strong&gt;no guarantees at all&lt;/strong&gt; about what will happen.&lt;/a&gt;
This is called a &lt;em&gt;use after free (UAF)&lt;/em&gt;
bug. For example,
the allocator might have reallocated the memory in response
to some other call to &lt;code&gt;malloc()&lt;/code&gt;, in which case it is now
full of some other data.  Of course, it&#39;s also quite likely
that the region is still unused and has the same contents
as before; it&#39;s just that the allocator added it to the list
of holes. In this case, the program may work fine under
test but then fail unpredictably later when some change
to your code causes allocations to happen differently and
suddenly &lt;code&gt;largest&lt;/code&gt; points to some memory reason being used
for something else.&lt;/p&gt;
&lt;p&gt;The reason this is all possible is that in C pointers are
just values that hold the memory address; effectively they&#39;re
just numbers and they behave like numbers. So if you assign
a pointer value to another variable, now you have two variables
that point to the same thing (i.e., they have the same value).
When we call free on the first copy of the variable, that
doesn&#39;t have any effect at all on the other copy (or on the
first one, for that matter). It just changes the state of
the memory region addressed by the variable. Once you&#39;ve
called &lt;code&gt;free(x)&lt;/code&gt; you&#39;re still left with whatever is in &lt;code&gt;x&lt;/code&gt;,
and nothing in C stops you from using it; it&#39;s just illegal
to do so, and it&#39;s your job not to, or else.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-c%2B%2B&quot;&gt;Next up: C++ &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-management-1/#next-up%3A-c%2B%2B&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As you have probably gathered by now, managing memory yourself is
a huge amount of work, which is one reason why C programs
have so many memory issues. In the next post in this series,
we&#39;ll be taking a look at C++, which has some features that
make things a bit better, at least some of the time.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This code actually stores the lines in unsorted order, then
sorts them, and then finally writes the output, but you
could also store them in a sorted data structure. Either
way, you need to store all of the lines. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
C99 supports a feature called &amp;quot;variable length arrays&amp;quot;,
but they don&#39;t automatically grow the way a Python array
does, so that doesn&#39;t help us much. There seems to
be a lot of sentiment that they are a &lt;a href=&quot;https://lkml.org/lkml/2018/3/7/621&quot;&gt;misfeature&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
If you know you&#39;re going to be doing a lot of
reallocation like this, many people will themselves overallocate,
for instance by doubling the size of the buffer every time they
are asked for more space than is available, thus reducing the
number of times they need to actually reallocate. I&#39;ve avoided
this kind of trickery to keep this example simple. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In reality, you can pass a &lt;code&gt;NULL&lt;/code&gt; to &lt;code&gt;realloc()&lt;/code&gt; for the existing
memory and it will just allocate new memory, but I&#39;m handling the
cases separately for pedagogical reasons.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In C, pointers to different kinds of objects can be different sizes. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This isn&#39;t technically true because the memory allocator has
kept track of what memory has been allocated, but the allocator
doesn&#39;t know that we should have cleaned up (for instance, we
might have stored the value in &lt;code&gt;lines&lt;/code&gt; somewhere) and so it
can&#39;t clean up for us. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For the nitpickers out there, this will also catch the case
where the last line in the file doesn&#39;t end in a newline. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Obviously you don&#39;t need to sort the lines in order to
do the largest one, but this is necessarily an artificial
example.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-management-1/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Why it&#39;s hard to trust software, but you mostly have to anyway</title>
		<link href="https://educatedguesswork.org/posts/ensuring-software-provenance/"/>
		<updated>2024-12-28T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ensuring-software-provenance/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;[Edited to change the title and subtitle -- 2024-12-28]&lt;/em&gt;.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/two-kids-under-a-trench-coat.jpeg&quot; width=&quot;400&quot; /&gt;
&lt;figcaption&gt;
Two children under a trenchcoat. Image from ChatGPT.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;My long-time collaborator &lt;a href=&quot;https://datatracker.ietf.org/person/rlb@ipv.sx&quot;&gt;Richard
Barnes&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
used to say
that &lt;em&gt;&amp;quot;in security, trust is a four letter word&amp;quot;&lt;/em&gt;, and yet the
dominant experience of using any software-based system—which is,
you know, pretty much anything electronic—is trusting the
manufacturer. Not only is there no meaningful way to &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software&quot;&gt;determine what
software&lt;/a&gt; is running on a given device
without trusting the device, even when you download the software
yourself, verifying that it&#39;s not malicious is extraordinarily
difficult in practice and mostly you just end up trusting the vendor
anyway.
Obviously, most vendors are honest, but what if they&#39;re not?&lt;/p&gt;
&lt;p&gt;A good motivating case here is secure messaging apps like iMessage,
WhatsApp, or Signal. People use these apps because they want to be
able to communicate securely and they are willing to trust them
with really sensitive information. In fact, a large
part of the value proposition of a secure messenger is that not
even the vendor can see your communications. For instance, here&#39;s
what Apple &lt;a href=&quot;https://www.apple.com/privacy/features/&quot;&gt;has to say about iMessage and FaceTime&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;End-to-end encryption protects your iMessage and FaceTime
conversations across all your devices. With watchOS, iOS, and
iPadOS, your messages are encrypted on your device so they can’t be
accessed without your passcode. iMessage and FaceTime are designed
so that there’s no way for Apple to read your messages when they’re
in transit between devices. You can choose to automatically delete
your messages from your device after 30 days or a year or keep them
on your device indefinitely. Messages sent via satellite also use
end-to-end encryption to protect your privacy.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This security guarantee critically depends on the app behaving
as advertised, which brings us right back to trusting the vendor.&lt;/p&gt;
&lt;p&gt;&amp;quot;But what about open source software?&amp;quot; I hear you say. &amp;quot;I&#39;ll just
review the source code and determine whether it&#39;s malicious&amp;quot;.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/one-does-not-simply-review.jpg&quot; alt=&quot;One does not simply review...&quot; /&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;I would make several points in response to this. The first is: &amp;quot;LOL&amp;quot;.
Any nontrivial program consists of hundreds of thousands to millions of
&lt;em&gt;[2024-12-28 -- fixed typo]&lt;/em&gt;
lines of code, and reviewing any fraction of that in a reasonable period
of time is simply impractical. The way you can tell this is that people
are constantly finding vulnerabilities in programs, and if it were
straightforward to find those vulnerabilities, then we would have
found them all. You&#39;re certainly not going to review every program
you run yourself, at least not in any way that&#39;s effective.
And that&#39;s just the first step: the supply chain from &amp;quot;source code available&amp;quot; to &amp;quot;I actually trust
this code&amp;quot; is very long and leaky. Even if you did review the source, most software—even open source software—is
actually delivered in binary form (when was the last time you compiled
Firefox for yourself?) so what makes you think the binary you&#39;re getting
was compiled from the source code you reviewed?&lt;/p&gt;
&lt;p&gt;Obviously, this is a bad situation if what you&#39;re using software
to do sensitive stuff—which, again, pretty much everyone is—and
there&#39;s been quite a bit of work on the general problem of being able
to give people more confidence in the software they&#39;re running.
It&#39;s far from a solved problem, so what I&#39;d like to do here is give you
a sense of the problem, hard it is, the solution space that&#39;s been explored,
and how far we are from a real solution.&lt;/p&gt;
&lt;h2 id=&quot;checking-software-provenance&quot;&gt;Checking Software Provenance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#checking-software-provenance&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As a warm-up, let&#39;s look at the problem of verifying downloaded
software (e.g., via your Web browser). This
is a much easier problem because we&#39;re trusting the publisher
not to provide malicious software; we&#39;re just trying to ensure
that the software we got is what the publisher intended.&lt;/p&gt;
&lt;h3 id=&quot;the-basic-supply-chain&quot;&gt;The Basic Supply Chain &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#the-basic-supply-chain&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;For reference, here&#39;s an example of a relatively simple software
supply chain with just a single code author.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/SoftwareSupplyChain.png&quot; alt=&quot;Example software supply chain&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
A simple software supply chain
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The process starts with the vendors engineers developing the
code. Typically this is done on their desktop (or laptop) machines,
with the engineers collaborating via some code repository site
(usually &lt;a href=&quot;https://github.com/&quot;&gt;GitHub&lt;/a&gt;. When engineer A makes
a change to the code, they publish it on GitHub and then engineers
B, C, etc. update their local copy.&lt;/p&gt;
&lt;p&gt;When it&#39;s time to build a release, a number of things can happen,
including:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Some engineer builds it on their local machine (step 2(a) above)&lt;/li&gt;
&lt;li&gt;The engineers tag the release on GitHub, prompting it to build
a release (step 2(b) above).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The release then gets uploaded to the vendor&#39;s website, which is
probably hosted on some cloud service like Amazon or Netlify.
You can also host the binaries on GitHub. In principle, users
could just download the binaries directly from your site (or GitHub) but it&#39;s
common to instead use a content distribution network (CDN) like Cloudflare
or Fastly which retrieves a copy of the binary once, caches it, and
then gives out copies to each user. CDNs are designed for massive
scaling, thus saving both load on your servers and cost.&lt;/p&gt;
&lt;p&gt;One thing you should notice right away is how many third parties are
involved in this process. Each of these is an opportunity for
corruption of the code on its way from the developers to the
user.&lt;/p&gt;
&lt;h3 id=&quot;code-signing&quot;&gt;Code Signing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#code-signing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The obvious &amp;quot;right thing&amp;quot; approach to software authentication
that everyone comes up with is to just digitally sign the package. This provides both
integrity (ensuring things weren&#39;t changed) and data origin
authentication (telling you who the package is from). Moreover,
signed objects are self-contained, so, for instance, you
can sign your package and then put it up for download on
someone else&#39;s site (or, in the diagram above, a CDN) and users will still be able to verify
it&#39;s from you.
These is a pretty good sounding set of properties
and unsurprisingly, both &lt;a href=&quot;https://developer.apple.com/developer-id/&quot;&gt;MacOS&lt;/a&gt;
and &lt;a href=&quot;https://learn.microsoft.com/en-us/windows-hardware/drivers/install/authenticode&quot;&gt;Windows&lt;/a&gt;
support signed applications.&lt;/p&gt;
&lt;p&gt;The basic idea here is that when you install a piece of software
you get a dialog like this (on Windows):&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/code-signing-sectigo.png&quot; alt=&quot;With and without code signing&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Windows code signing dialog. Image from &lt;a href=&quot;https://sectigostore.com/page/microsoft-authenticode-code-signing-certificates/&quot;&gt;Sectigo&lt;/a&gt;.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Or alternately, maybe you just get warning:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/mac-unsigned-babkin.png&quot; alt=&quot;MacOS without code signing&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Mac unsigned binary dialog. Image from &lt;a href=&quot;https://dennisbabkin.com/blog/?t=how-to-get-certificate-code-sign-notarize-macos-binaries-outside-apple-app-store&quot;&gt;Dennis Babkin&lt;/a&gt;.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Here&#39;s what Apple&#39;s dialog looks like for a signed binary.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/macos-valid-software.png&quot; alt=&quot;MacOS with code signing&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Mac signed binary dialog.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The way that Microsoft&#39;s version of code signing (Authenticode) works is
the software author gets a code signing certificate issued by a public
certificate authority. They then use their private key to sign the
binary. Apple&#39;s system is similar, except that instead of using a
public CA you need to be an Apple registered developer (surprise!).
When you download a binary and try to run it the first time, the
operating system checks the signature and then pops up the appropriate
dialog box, telling you who signed the code, or warning you that it&#39;s
unsigned, with the exact details
depending on the operating system version.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Modern versions of MacOS have become increasingly aggressive about
not letting you run unsigned code, but as of this writing, it&#39;s
&lt;a href=&quot;https://dennisbabkin.com/blog/?t=how-to-get-certificate-code-sign-notarize-macos-binaries-outside-apple-app-store#run_unsigned&quot;&gt;still possible&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Mobile operating systems are even more locked down, where almost
all software is installed from some app store (typically operated
by the vendor).
Historically, Apple has &lt;strong&gt;only&lt;/strong&gt; let you install
apps from the iOS app store, whereas on Android you could install
third party apps if you were willing to work a bit. In response to
the EU Digital Markets Act, Apple is allowing &lt;a href=&quot;https://developer.apple.com/support/dma-and-apps-in-the-eu/&quot;&gt;alternative app
installation inside Europe&lt;/a&gt;,
but even then it&#39;s much less convenient, and not available in the
US at all.
Apps installed through the app store are also signed and the
mobile OS automatically verifies the provenance of the app.&lt;/p&gt;
&lt;p&gt;The basic problem with code signing systems is that they rely heavily
on user diligence, because the OS only verifies that the code &lt;em&gt;was
signed&lt;/em&gt; but doesn&#39;t know who was supposed to sign it. In the simplest
case, consider what happens if you are lured to the attacker&#39;s web
site and persuaded to download an app. As long as the attacker has a
code signing certificate—or is an approved Apple
developer—then they can send you a malicious binary, which will
run fine. In principle users are supposed to check the publisher
name—assuming, that is, that the OS even shows a dialog
box—but we know from long experience that users don&#39;t check this
kind of thing. If it becomes well-known that a given publisher is signing
malicious binaries, the OS vendor might blocklist the publisher, but this
takes time and leaves the vendor constantly chasing bad behavior.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Moreover, &amp;quot;well-known&amp;quot; is doing a lot of work here, as it&#39;s not exactly
unheard of for malicious apps to &lt;a href=&quot;https://www.darkreading.com/cyberattacks-data-breaches/malicious-apps-millions-downloads-apple-google-app-stores&quot;&gt;make it into various app stores&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you&#39;re on a mobile operating system, you&#39;ll of course be downloading
programs via the app store. The situation is a little better here
because the app store operates the directory, and so offers
(sort of) unambiguous naming and at least in principle the
app store operator can do something about &lt;a href=&quot;https://support.apple.com/guide/adguide/unacceptable-or-prohibited-content-guidelines-apd527d891a8/icloud&quot;&gt;copycat software with
confusing
names&lt;/a&gt;,
so it&#39;s harder to trick you into installing the wrong package,
though of course you&#39;re trusting the app store vendor to provide
the right package. It&#39;s still signed but the device vendor
controls the signing key authentication system, so they can can impersonate anyone they want.&lt;/p&gt;
&lt;p&gt;Whether you are downloading software directly or via an app
store, it&#39;s usually
necessary to download software over a secure transport &lt;em&gt;even if
there is code signing&lt;/em&gt;. If you don&#39;t download software using
secure transport then a network attacker can substitute their own
code—signed with their own valid certificate—during the
download process; unless you check the publisher&#39;s identity,
you&#39;ll end up running the attacker&#39;s code.&lt;/p&gt;
&lt;p&gt;Of course, if you have to download software over secure transport anyway,
this raises the natural question of why bother to sign the code
at all? Why not &lt;em&gt;just&lt;/em&gt; have all downloads happen over secure
transport? One reason is that is that signing allows for third
party hosting. The big technical difference between code signing and transport security
is that the signed object is a self-contained package that can be
distributed by anyone. This is a big asset in any scenario
where the publisher doesn&#39;t want to—or isn&#39;t allowed to—distribute
the software directly.&lt;/p&gt;
&lt;h4 id=&quot;blocklisting&quot;&gt;Blocklisting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#blocklisting&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Another reason for signing is to make blocklisting easier.
As I mentioned above, if the OS vendor determines that a publisher
is misbehaving, they can revoke permissions for that publisher,
thus preventing software signed with their certificate from being
installed. This is a highly imperfect mechanism for two obvious
reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;An attacker can register as a different publisher and continue
to sign as that publisher until they get caught.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An attacker can just distribute unsigned software.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that you could operate a blocklist where you just listed
malicious software—for instance by publishing a hash—but
that would be much easier to evade, as the attacker could just
change their software until it evaded detection; this is a common
problem with antivirus software. If you require software to be signed
by some key that chains back to some non-free credential, then this
allows you to increase the level of friction to distribute all
software, but especially malicious software.  If you want to register as a different publisher,
you have to establish that identity and then get a certificate, join
the developer program, etc. None of this is free, though we&#39;re probably
talking hundreds of dollars, not thousands,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
so it makes it somewhat more expensive to distribute malware.&lt;/p&gt;
&lt;p&gt;Of course, the attacker could just distribute unsigned software, but then there&#39;s
some additional friction in the install experience, so you might
not manage to infect quite as many victims.&lt;/p&gt;
&lt;h4 id=&quot;automatic-updates&quot;&gt;Automatic Updates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#automatic-updates&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;A lot of modern software has some sort of self-updating feature.
Unlike the initial install, however, the software updater is
written—or at least distributed—by the publisher, who
knows precisely who should be signing the update, and so signatures
work just fine as a security measure.  Conceptually, this is a similar
to the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Trust_on_first_use&amp;amp;oldid=1264102677&quot;&gt;trust on first use
(TOFU)&lt;/a&gt;
mechanisms used by SSH: as long as you get the right
packager the first time, you&#39;re safe in the future because
the publisher can directly authenticate the code.&lt;/p&gt;
&lt;h4 id=&quot;package-managers&quot;&gt;Package managers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#package-managers&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the open source world, it&#39;s common to have a package manager which
lets you install software from the command line. For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Linux and FreeBSD come with a variety of package managers
(&lt;a href=&quot;https://ubuntu.com/server/docs/package-management&quot;&gt;apt&lt;/a&gt;,
&lt;a href=&quot;https://www.debian.org/doc/manuals/debian-faq/pkgtools.en.html&quot;&gt;dpkg&lt;/a&gt;,
etc.). On MacOS it&#39;s possible to install third party software via
&lt;a href=&quot;https://brew.sh/&quot;&gt;Homebrew&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Many modern  programming languages have some kind of
package manager, such as &lt;a href=&quot;https://www.npmjs.com/&quot;&gt;npm (JavaScript)&lt;/a&gt;,
&lt;a href=&quot;https://crates.io/&quot;&gt;Cargo/Crates (Rust)&lt;/a&gt;, &lt;a href=&quot;https://pypi.org/&quot;&gt;PyPi
(Python)&lt;/a&gt;, etc.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The way these systems typically work is
that people publish their packages onto the package manager site and
people download the packages from there using some local program provided with
the language or the operating system. This obviously makes the
distribution site a &lt;a href=&quot;https://www.computerweekly.com/news/366609663/PyPI-loophole-puts-thousands-of-packages-at-risk-of-compromise&quot;&gt;single point of
vulnerability&lt;/a&gt;,
in at least two ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The package author&#39;s account on the package repository might
be compromised (e.g., if they don&#39;t use &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Multi-factor_authentication&amp;amp;oldid=1249536145&quot;&gt;MFA&lt;/a&gt;),
and the attacker uploads a malicious version. A huge amount
of the energy in securing open source supply chains has
gone into preventing this kind of attack.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The package repository itself is compromised, and the attacker
uses their access to upload a malicious package.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Using secure transport to the package repository is standard
practice, but it doesn&#39;t help against either of these threats
because the problem is the data on the package manager itself
is compromised. In theory it seems like signatures offer a way
out of this: if packages are signed then even if the attacker
compromises the repository they won&#39;t be able to replace the
package with their own.&lt;/p&gt;
&lt;p&gt;Unfortunately package signing isn&#39;t a complete solution for the same
kind of identity reasons as before.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Attackers can submit malicious &lt;a href=&quot;https://www.sonatype.com/blog/open-source-attacks-on-the-rise-top-8-malicious-packages-found-in-npm&quot;&gt;copycat packages&lt;/a&gt;
to the package repository with similar names to legitimate
packages. It&#39;s easy to be fooled by this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the package repository is malicious, then it can point you
to the wrong package. When you first decide to use a package, you probably go to
the package manager site and do some kind of search (e.g.,
&amp;quot;give me a package for task &lt;code&gt;example&lt;/code&gt;). If the package manager
site is under the control of the attacker, then they can just tell you to
install package &lt;code&gt;example-attacker&lt;/code&gt; (hopefully with a less
obvious name) instead of package &lt;code&gt;example&lt;/code&gt;. The package
will be signed, just by the attacker.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Even if you know the right package name, a malicious package repository
can still attack you because you don&#39;t know what key should be
signing the packages, so once again you have to worry about identity
substitution. What&#39;s needed here is some way to issue credentials
that are tied unambiguously to the package name. One could imagine
a number of ways to do this, including (1) having the package
manager repo run its own CA or (2) tying package names to domain
names the way Java does (e.g., &lt;code&gt;com.example.package-name&lt;/code&gt;
and using the WebPKI, which already attests to domain names.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As with the case of software updating, however, the problem is easier
once a package has been downloaded, because you could store the
package signing key along with the package (e.g., in the
&lt;code&gt;package.json&lt;/code&gt;) file, and then generate an alert if packages
aren&#39;t signed with that key (TOFU again).
Better yet, this would also work when &lt;em&gt;other people&lt;/em&gt; go to use
your package: if they get your list of keys then all the dependencies
would be protected.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There&#39;s been talk for a long time about signing packages, but
it doesn&#39;t seem to have really gotten off the ground for any
of the major package managers. For example, PyPi used to have
GPG signatures, but it looks like they didn&#39;t work that well
for a variety of &lt;a href=&quot;https://blog.pypi.org/posts/2023-05-23-removing-pgp/&quot;&gt;operational reasons&lt;/a&gt;
and they were recently removed and &lt;a href=&quot;https://blog.pypi.org/posts/2024-11-14-pypi-now-supports-digital-attestations/&quot;&gt;replaced with &amp;quot;digital attestations&amp;quot;&lt;/a&gt; based
on &lt;a href=&quot;https://www.sigstore.dev/&quot;&gt;sigstore&lt;/a&gt;, but many popular packages are not signed,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
and as far as I can tell there is as yet no &lt;a href=&quot;https://blog.trailofbits.com/2024/11/14/attestations-a-new-generation-of-signatures-on-pypi/&quot;&gt;automatic verification&lt;/a&gt;.
Note that
&lt;a href=&quot;https://docs.npmjs.com/about-registry-signatures&quot;&gt;npm&lt;/a&gt; supports what&#39;s called &amp;quot;registry signatures&amp;quot;
using ECDSA, but the signatures are made by the npm registry
(package manager) using &lt;a href=&quot;https://registry.npmjs.org/-/npm/v1/keys&quot;&gt;its keys&lt;/a&gt;,
so this doesn&#39;t protect you against compromise of the package management system.
The bottom line is that you mostly need to trust the server that
is publishing the packages not to send you malicious packages.&lt;/p&gt;
&lt;h2 id=&quot;how-not-to-trust-the-publisher-(or-at-least-trust-them-less)&quot;&gt;How not to trust the publisher (or at least trust them less) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#how-not-to-trust-the-publisher-(or-at-least-trust-them-less)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Of course, this was all warmup for the real problem we want to
solve. Everything up to now was about ensuring that you get the binary
that the publisher wanted to send you. This still leaves you
trusting the publisher, which you shouldn&#39;t, both because it&#39;s
bad security practice to have to trust people and because
there is plenty of evidence of software &lt;a href=&quot;https://www.bitdefender.com/en-us/blog/hotforsecurity/facebook-app-for-ios-caught-accessing-camera-in-background&quot;&gt;publisher&lt;/a&gt;
&lt;a href=&quot;https://www.theverge.com/23935029/microsoft-edge-forced-windows-10-google-chrome-fight&quot;&gt;misbehavior&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are two main threats to consider from a malicious vendor:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A broad attack where a malicious binary is distributed to everyone.&lt;/li&gt;
&lt;li&gt;A targeted attack where a malicious binary is only distributed
to specific people.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Over the past 10 years or so, the industry hive mind
has developed a sort of
aspirational three part roadmap for what it would take to actually
provide confidence in binaries without trusting the vendor.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Reviewable source code to allow people to verify program
functionality.&lt;/li&gt;
&lt;li&gt;Reproducible builds to verify the compilation process.&lt;/li&gt;
&lt;li&gt;Binary transparency to ensure that people are getting the
right binary and that everyone is getting the same binary
(thus preventing targeted attack).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The relationship between these is shown in the diagram below.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/software-validity.png&quot; alt=&quot;Software provenance workflow&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Ensuring software provenance
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The process starts with the publisher releasing the source code.  As a
practical matter, some kind of review of the source code is a
necessary but not sufficient precondition to being able to have
confidence in a piece of software. Reviewing the
binary is not really practical on any kind of scalable level; it is of
course possible to reverse engineer binaries, but it&#39;s incredibly time
consuming even for experts. The expectation is that if the software
is important enough, then some set of people will scrutinize it,
looking for defects. If this process is working correctly, then it
should be safe for people to download the (reviewed) source code
and compile it themselves.&lt;/p&gt;
&lt;p&gt;That&#39;s enough in some cases (e.g., if you&#39;re building a Web app
and you didn&#39;t minify or obfuscate the code), but in most cases, people want to download compiled versions
even when the software itself is open source. But how do you know
that the binary that the vendor is distributing to the user
corresponds to the (presumably) safe source code. The general
idea is that some set of people (the reviewers again?) build
the binary themselves and compare it to the binary that the
vendor is distributing. This is actually harder than it sounds
for two reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It&#39;s often not the case that you can compile the same
source code and get the same binary.&lt;/li&gt;
&lt;li&gt;Even if the reviewers get the same binary as the vendor,
how do you know that you got the same binary as both of them?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first problem is addressed by having what&#39;s called &amp;quot;reproducible
builds&amp;quot;, which is what it sounds like: making it possible for
two people to get the same binary from the same source. Once
you have reproducible builds, then it should be possible for
third parties to check the compilation process.&lt;/p&gt;
&lt;p&gt;The second problem is addressed by a technique called
&lt;a href=&quot;https://binary.transparency.dev/&quot;&gt;binary transparency&lt;/a&gt; (BT).
BT is like &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/&quot;&gt;Certificate Transparency (CT)&lt;/a&gt;
and involves publishing hashes of each binary generated by the
vendor. For instance, when Mozilla releases Firefox 140, they would
publish hashes for the Mac, Windows, and Linux builds into the
BT log. Users and reviewers could then independently verify that
their copy of the binary (downloaded in the case of the user, built in
the case of the reviewer) were what was in the log, providing assurance
that everyone got the same binary.&lt;/p&gt;
&lt;p&gt;When you put all of this together, you get what should be end-to-end
verifiability for the program&#39;s behavior:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Independent source code review verifies that the source code is non-malicious.&lt;/li&gt;
&lt;li&gt;Reproducible builds allow for comparison between the vendor
compiled binary and independently produced binaries from
the reviewed source code.&lt;/li&gt;
&lt;li&gt;Binary transparency allows users to verify that they got
the same binaries that were compiled from the reviewed
source code, and that they are the same as everyone else
got.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&#39;s look at each of these in more detail.&lt;/p&gt;
&lt;h3 id=&quot;first%2C-we-publish-the-source&quot;&gt;First, we publish the source &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#first%2C-we-publish-the-source&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Even with access to the source code, it&#39;s very difficult to really be
sure what a program does and even hard to exclude the possibility
that it does something malicious.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
The basic problem is that software is incredibly complicated
and so just getting to the point where you understand &lt;em&gt;approximately&lt;/em&gt;
what it does is very time consuming.&lt;/p&gt;
&lt;p&gt;Moreover, the easiest way to read code—at least for me—is
to try to figure out what it&#39;s trying to do, which means just
reading it. Like reading text, this means that you skip over
little details and errors because they interfere with overall
comprehension. It&#39;s much harder to put yourself in the mode
of really studying each piece of the code and making sure you
know exactly what it is &lt;em&gt;actually&lt;/em&gt; doing rather than what
you think it should be trying to do and assuming it does that.
However, that&#39;s exactly what you need to do when you review
a piece of code, because defects so often arise when the
programmer wrote something that is superficially sensible
but is actually broken on closer inspection. It&#39;s a similar
task to copy editing, where you have to focus on the details
and deliberately suppress your mind&#39;s natural tendency to
correct any errors and process the big picture. And of course
the more code you have to read the harder the job is.&lt;/p&gt;
&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;10 lines of code = 10 issues.&lt;br /&gt;&lt;br /&gt;500 lines of code = &amp;quot;looks fine.&amp;quot;&lt;br /&gt;&lt;br /&gt;Code reviews.&lt;/p&gt;&amp;mdash; I Am Devloper (@iamdevloper) &lt;a href=&quot;https://twitter.com/iamdevloper/status/397664295875805184?ref_src=twsrc%5Etfw&quot;&gt;November 5, 2013&lt;/a&gt;&lt;/blockquote&gt; &lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt; 
&lt;p&gt;I&#39;m certainly not telling you not to review code, but it&#39;s
important to recognize its limits. To a first order, every line
of code that goes into Chrome and Firefox is reviewed and
the reviewers take their jobs seriously, and yet both browsers
still ship with plenty of undetected vulnerabilities and even
more undetected defects. Humans simply aren&#39;t up to being able
to the task of finding every defect in a piece of software,
especially when it requires reasoning about hundreds of thousands
of lines of code all at once; it&#39;s not even easy to find defects
when you know they&#39;re there and and approximately what the misbehavior
is, as anyone has had to debug a complicated issue can tell you.&lt;/p&gt;
&lt;p&gt;Moreover, everything I&#39;ve just said is about the setting
where the original author and the reviewer are on the same
side, with the author trying to write clear, correct code
and the reviewer trying to genuinely understand it. The
problem is of course much harder if the author is trying
to actively deceive the reviewer, which is what we are worried
about here. There used to be something called the
&lt;a href=&quot;https://underhanded-c.org/&quot;&gt;underhanded C contest&lt;/a&gt; where
the idea was to write a program which looked normal and behaved
normally under most conditions but had a defect that could
be triggered with the right input. Some of the programs are
quite clever and it&#39;s easy to believe you would miss errors
during the review phase.&lt;/p&gt;
&lt;h4 id=&quot;vulnerabilities-vs.-malicious-code&quot;&gt;Vulnerabilities vs. Malicious Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#vulnerabilities-vs.-malicious-code&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s important to recognize that a malicious vendor doesn&#39;t
need to embed all the functionality that they want into the
source code; they just need to introduce a vulnerability that allows
them to exploit the software once it&#39;s compiled, just as attackers
regularly do with unintentional vulnerabilities.  This makes
it much harder to detect malicious code because you can&#39;t
just study the functionality to see if there is something
fishy, you need to find all the defects.&lt;/p&gt;
&lt;p&gt;The problem is especially acute in &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety&quot;&gt;non-memory safe languages&lt;/a&gt;
like C and C++ because (1) it is easy to create defects that
cause memory vulnerabilities and hard to detect them
(2) exploiting those vulnerabilities is
a very well understood problem and (3) memory vulnerabilities
are generally lead to powerful exploits, up to and including
remote code execution, which would allow an attacker to
do anything they wanted on your machine. By contrast, in a language
like Rust or Python, many defects just cause program failure and
you have to work a lot harder to get to remote code execution.&lt;/p&gt;
&lt;p&gt;Actually, a malicious vendor doesn&#39;t really have to do anything to
deliberately introduce defects because, as I keep saying,
real software is full of vulnerabilities, which in almost all
cases were introduced by accident. All the vendor has to do is
not fix some of those defects (assuming they discovered
them themselves). Presto, instant malicious software, plus
plausible deniability.&lt;/p&gt;
&lt;h4 id=&quot;you&#39;re-not-really-going-to-do-this-yourself-are-you%3F&quot;&gt;You&#39;re not really going to do this yourself are you? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#you&#39;re-not-really-going-to-do-this-yourself-are-you%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Even if it were in principle possible to verify that a piece of
software was free of malicious code and vulnerabilities, it would be
at best an incredibly time consuming process. For example, Firefox
consists of tens of millions of lines of code. If you were to review one
line  of code a second, you&#39;d still be looking at something like a
year wall clock time just to review that one program. And this assumes
that you&#39;re expert enough to do that, which almost nobody is. Clearly,
this isn&#39;t something people are going to do for themselves.&lt;/p&gt;
&lt;p&gt;This is a piece of the puzzle that doesn&#39;t get talked about that much,
but I think that people have some vague that somehow the open source community
will self-organize to review the entirety of all open source software
in the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Linus%27s_law&amp;amp;oldid=1237652268&quot;&gt;given enough eyeballs, all bugs are shallow&lt;/a&gt;
sense, and if something bad was found, it would be reported and
fixed, and if really obviously bad, there would be some consequences
for the vendor, if only in the form of negative press and people
complaining on Hacker News.&lt;/p&gt;
&lt;p&gt;I think you should be suspicious of this in at least two ways.
First, open source software routinely has &lt;a href=&quot;https://www.usenix.org/system/files/sec22-alexopoulos.pdf&quot;&gt;quite old vulnerabilities&lt;/a&gt;,
so clearly whatever we have now is not effectively fulfilling this function.
Second, it&#39;s not clear to me what such a structure would look like:
would someone parcel out the pieces of code for others to look at?
Would we have a registry of what had been reviewed? How would you know that
reviewers weren&#39;t malicious? Who would pay for all this reviewer time?
I suppose it&#39;s possible we could build some mechanism for the highest
profile software, though in practice my experience is that that&#39;s precisely the
code that everyone just assumes is nonmalicious.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
In a number of cases vendors have contracted
for some published third party audit (e.g., &lt;a href=&quot;https://blog.trailofbits.com/2022/12/22/curl-security-audit-threat-model/&quot;&gt;cURL&lt;/a&gt;,
&lt;a href=&quot;https://blog.trailofbits.com/2024/07/30/our-audit-of-homebrew/&quot;&gt;Homebrew&lt;/a&gt;,
&lt;a href=&quot;https://blog.mozilla.org/security/2023/12/06/mozilla-vpn-security-audit-2023/&quot;&gt;Mozilla VPN&lt;/a&gt;,
etc.), and there has been some progress on
&lt;a href=&quot;https://mozilla.github.io/cargo-vet/&quot;&gt;crowdsourcing review of Rust crates&lt;/a&gt;,
but I don&#39;t think anyone really thinks audits capture every
vulnerability so much as providing an overall assessment of code
quality, and in my experience auditors don&#39;t usually go into the
engagement assuming that the vendor is malicious.&lt;/p&gt;
&lt;h3 id=&quot;verifying-the-build&quot;&gt;Verifying the Build &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#verifying-the-build&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;OK, so you&#39;ve convinced yourself that the source code is non-malicious,
but in most cases you don&#39;t run the source code but rather the compiled
binary, and you usually don&#39;t compile it yourself but rather download it from
the vendor, even for open source software. There are a number of reasons
for this, but for starters, compiling even a modestly large package
can take a long time, and that&#39;s not even to mention installing all
the prerequisites (do you even have a compiler installed?).
There certainly are systems where people have to install everything
from source (&lt;a href=&quot;https://www.gentoo.org/&quot;&gt;Gentoo Linux&lt;/a&gt;, I&#39;m looking at you),
but it&#39;s not exactly the most convenient thing; there&#39;s a reason why
even systems like &lt;a href=&quot;https://brew.sh/&quot;&gt;Homebrew&lt;/a&gt; which start with
other people&#39;s source code provide binaries.&lt;/p&gt;
&lt;p&gt;However, if you&#39;re installing the binary, how do you know that the vendor
has actually compiled it from the source code you looked at rather
than from some other malicious source? The obvious thing to do is to
just download the source code and compile it yourself. This is actually a lot harder than it
looks because two independent compilations of the same source code
often do not produce the same binary. This may be somewhat surprising,
as compilation feels like a mechanical process, but there are actually
a number of important sources of variation, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You may not have exactly the same toolchain (libraries, compiler,
etc.) as the publisher used.
&lt;ul&gt;
&lt;li&gt;If you have two different versions of
some dependency and that is included in the final binary, the
result will obviously be different.&lt;/li&gt;
&lt;li&gt;The compiler has a lot of discretion in how to compile a given piece
of source code, and even different versions of the same compiler
might behave differently (e.g., using different optimizations).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Binaries often include timestamps, which will obviously be different
each time you compile.&lt;/li&gt;
&lt;li&gt;Some build chains are inherently non-deterministic. For instance,
Firefox builds uses a technique called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Profile-guided_optimization&amp;amp;oldid=1250741304&quot;&gt;profile-guided
optimization&lt;/a&gt; in which you run the program under instrumentation and
use the results to inform the optimization process. Because
profiling is sensitive to the underlying state of the computer,
you can have small instabilities in the results which produce
different outcomes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This isn&#39;t to say that it&#39;s impossible to have builds be exactly the
same each time (this is called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Reproducible_builds&amp;amp;oldid=1254961378&quot;&gt;reproducible builds&lt;/a&gt;, and there is a known set
of &lt;a href=&quot;https://reproducible-builds.org/docs/&quot;&gt;techniques&lt;/a&gt; for making them
work) but it&#39;s a nontrivial task to make a given build reproducible,
and if the publisher hasn&#39;t done it for their system—including
providing reproduction information—then you&#39;re pretty much out
of luck. However, if the publisher &lt;em&gt;has&lt;/em&gt; enabled reproducible builds,
then it should be reasonably practical to independently verify a
given binary.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;a-non-reproducible-build&quot;&gt;A non-reproducible build &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#a-non-reproducible-build&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Back when I was at Mozilla, one thing I worked on was the
the &lt;a href=&quot;https://wiki.mozilla.org/NSS&quot;&gt;NSS security library&lt;/a&gt; in Firefox. NSS
dated back to the original origins of Firefox back at &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Netscape&amp;amp;oldid=1261674314&quot;&gt;Netscape&lt;/a&gt; and had some unusual
and very old feeling formatting choices. The team decided to adopt
the Google C style guide, in part because you could use an automated
formatter to mass reformat all the code. Naturally we were a little
worried about introducing defects, so we decided to compare the output
binaries pre- and post-format. NSS compilation was pretty simple
and so we expected this to just work, but surprisingly the results
didn&#39;t match.&lt;/p&gt;
&lt;p&gt;After a fair bit of head scratching, one of the engineers discovered
the issue: we had a few locations in the code that used the C &lt;code&gt;__LINE__&lt;/code&gt;
preprocessor macro, which is translated to the current line of source code,
and embedded the result in a a string. When we had reformatted the
code, it had changed what line this use of the macro appeared,
leading to a difference.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;binary-transparency&quot;&gt;Binary Transparency &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#binary-transparency&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If the publisher has made builds reproducible, then, then in principle
you should be able to compile the code yourself and compare the binary
you get to the one on the publisher&#39;s Web site, but then why did you
bother to download the binary at all? Just as with
reviewing the source code, maybe somebody &lt;em&gt;else&lt;/em&gt; that you trust could
do this and report back if there was a mismatch. This is where
binary transparency (BT) comes in.&lt;/p&gt;
&lt;p&gt;BT is like &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/&quot;&gt;Certificate Transparency
(CT)&lt;/a&gt; but instead of publishing every
certificate, the publisher instead publishes a hash of every binary
they release. The idea here is that there should be only a small
number of canonical binaries for every version (say one for each
platform/language combination). When you went to install a piece of
software you would verify that it appeared in the BT log and that
there weren&#39;t an unreasonable number of entries in the log (ideally
there would be exactly one for every configuration). This doesn&#39;t
verify that the binary is non-malicious but just that you&#39;re getting
the same binary as everyone else.&lt;/p&gt;
&lt;p&gt;Our hypothetical auditors would independently build
copies of the binary and verify that they matches whatever was
in the log. If there was a mismatch, they would (somehow) report
the issue and hopefully it would get enough PR that the publisher
would be required to explain the issue; if they didn&#39;t have an
innocuous explanation (e.g., an alternate way of compiling to
that binary) then this is evidence that something is wrong.
Note that this system relies crucially on some assumptions about
the behavior of third parties, namely that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Someone is actually doing their own builds and checking the
logs.&lt;/li&gt;
&lt;li&gt;Reporting of log mismatches gets enough attention that there
will be consequences for the publisher.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This seems like something that will work a lot better for big
vendors; if Chrome&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
builds can&#39;t be matched to something in the
BT log, this is a much bigger issue than some package with 20
users.&lt;/p&gt;
&lt;p&gt;Even without open source and reproducible builds, BT still provides
&lt;em&gt;some&lt;/em&gt; value in that it makes it harder for the vendor to supply
individualized malicious builds to a small number of people; if you
get a unique build you should perhaps worry that you have been targeted.
Of course in this case we&#39;re depending even more heavily on the
BT logs being audited because the signature of this attack
is just an unusual number of versions in the log, and it&#39;s not
at all uncommon to have a lot of software versions floating
around for various reasons (alpha/beta releases, A/B testing,
development builds, localization, etc.)&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;smuggling-binary-transparency-into-certificate-transparency&quot;&gt;Smuggling Binary Transparency into Certificate Transparency &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#smuggling-binary-transparency-into-certificate-transparency&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;When I was at Mozilla, we spent some time trying to
&lt;a href=&quot;https://wiki.mozilla.org/Security/Binary_Transparency&quot;&gt;figure out how to deploy Binary Transparency&lt;/a&gt;. At the time there weren&#39;t any BT
logs at all, so one of us (I think it was either Richard Barnes or I)
came up with the idea of publishing the binary hashes in
the CT log by minting new domain names of the form
&lt;code&gt;&amp;lt;hash&amp;gt;.&amp;lt;firefox-version&amp;gt;.fx-trans.net&lt;/code&gt;, getting certificates for that
name, and then using the CT log to provide transparency.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;At present we&#39;re seeing modest levels of binary transparency. In
particular, Google has deployed it for Android &lt;a href=&quot;https://developers.google.com/android/binary_transparency/overview&quot;&gt;firmware and
APKs&lt;/a&gt;,
using Google-provided logs.  The &lt;a href=&quot;https://www.sigstore.dev/&quot;&gt;sigstore&lt;/a&gt;
project provides generic tooling for binary signing, reproducible
builds, and binary transparency and seems to be getting some
uptake. However, we&#39;re not seeing the kind of large-scale deployment
that we have for certificate transparency, and there don&#39;t seem to be
any generic logs in wide use like there are with CT.
Facebook has also deployed a system called &lt;a href=&quot;https://www.facebook.com/help/messenger-app/799550494558955&quot;&gt;Code Verify&lt;/a&gt;
to provide a form of binary transparency for Facebook Messenger.
See &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#the-web&quot;&gt;below&lt;/a&gt; for more on this.&lt;/p&gt;
&lt;h3 id=&quot;what&#39;s-your-trusted-computing-base%3F&quot;&gt;What&#39;s your Trusted Computing Base? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#what&#39;s-your-trusted-computing-base%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you&#39;ve been paying attention you may have noticed that this
all requires a fair amount of computation on the user&#39;s computer.
After they&#39;ve downloaded the binary, they need to compute
its hash and verify that it&#39;s been published in the BT log;
obviously you&#39;re not going to do this by hand unless you have
a lot of time.
All of this requires some kind of software on the user&#39;s computer,
and it needs to be software you trust.&lt;/p&gt;
&lt;p&gt;Ideally, of course, we&#39;d have some sort of generic system that
handled all of this, but that&#39;s not generically the case on a desktop operating
system, so we&#39;re mostly back to the problem at the very
beginning of identifying which software package the user is trying
to download; it&#39;s not
just that the binary is &lt;em&gt;somewhere&lt;/em&gt; on the BT log; it needs to be
associated with the right name, which is to say &lt;code&gt;firefox&lt;/code&gt; and not
&lt;code&gt;firef0x&lt;/code&gt;, but the vendor knows&lt;/p&gt;
&lt;h4 id=&quot;updaters&quot;&gt;Updaters &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#updaters&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Once you &lt;em&gt;have&lt;/em&gt; downloaded the right software,
then the publisher can incorporate BT checking into the
software updater as it&#39;s already
custom software so you don&#39;t need to worry about having a generic BT
log (because there really isn&#39;t one).  Moreover, this solves the
problem of knowing what binary to look for in the BT log, because
the vendor knows the name of their own software.&lt;/p&gt;
&lt;p&gt;Of course, now we&#39;ve just shifted the problem from having to trust
the software provider to provide you a nonmalicious binary to having
to trust the software provider to send you a nonmalicious updater,
so things haven&#39;t necessarily improved that much. However, it is
better in one specific way: it protects you from the publisher
starting out nonmalicious and then becoming malicious. That&#39;s a real
problem, for instance if the attacker &lt;a href=&quot;https://www.helpnetsecurity.com/2024/04/16/open-source-project-takeover/&quot;&gt;takes over a legitimate package&lt;/a&gt; or if they decide to attack
you personally for some reason.&lt;/p&gt;
&lt;h4 id=&quot;app-stores&quot;&gt;App Stores &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#app-stores&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;By contrast, mobile operating systems &lt;em&gt;do&lt;/em&gt; have a generic initial software
installation mechanism, which is to say the app store.
App stores also come with automatic updating, and because the app
store operator rather than the publisher is responsible for the
update, it&#39;s much harder for the publisher to provide target-specific
malicious code, though of course they can provide a malicious build to
everyone. This provides some guarantee that everyone is getting the
same binary even without binary transparency, because the publisher
can&#39;t supply multiple binaries.&lt;/p&gt;
&lt;p&gt;Of course, as noted above, you have to trust the platform vendor
who operates the app store not to themselves send you a malicious
binary, but in most cases you&#39;re trusting them anyway because they
provided the operating system, the installer, and any mechanism you
have to view the binary. This is obvious on iOS, which is a completely
closed system, but even on Android, &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software&quot;&gt;all of your interactions with the
system are intermediated by hardware and firmware provided by the
device vendor&lt;/a&gt;, so you&#39;re reduced to trusting Google and the phone
manufacturer anyway.  I&#39;m not
saying this is good, just that it&#39;s the way it is.&lt;/p&gt;
&lt;p&gt;Note that it doesn&#39;t really help that much if the platform vendor
actually did binary transparency, because it&#39;s their software that
does the checking and you don&#39;t have a good way of checking that
software. So while I think it&#39;s good that Google is trying to
prime the pump some with Android binary transparency, I&#39;m skeptical
that it provides significant benefit to the user.&lt;/p&gt;
&lt;p&gt;On the other hand, if you &lt;em&gt;do&lt;/em&gt; trust the platform vendor, then
the app store model can provide a significant amount of additional
security even in the absence of the app store enforcing strict
policies on the binaries, just because the app store insulates
you from the publisher. Moreover, if the vendor requires reproducible
builds, then any source code review that they do—or if
the program is open source, that others do—can be connected
to the resulting binary. For example, &lt;a href=&quot;https://extensionworkshop.com/documentation/publish/source-code-submission/&quot;&gt;Firefox add-ons can be
submitted in two ways&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In source code form directly (add-ons are written in JavaScript,
so you don&#39;t need to compile prior delivery).&lt;/li&gt;
&lt;li&gt;In a pre-packaged form, but with a complete copy of the source
code sufficient to build the packaged version (and even then,
&lt;a href=&quot;https://extensionworkshop.com/documentation/publish/source-code-submission/#use-of-obfuscated-code&quot;&gt;obfuscation is forbidden&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mozilla does some source code review, and this system ensures that
whatever ships is what was reviewed, though of course you&#39;re
reliant on the quality of Mozilla&#39;s review, which is somewhat
variable. If the add-on isn&#39;t open source
(which isn&#39;t required by Mozilla&#39;s policies) this is all you get, but
if it is open source (or just delivered as source), then anyone can in
principle do this kind of review for themselves.&lt;/p&gt;
&lt;h4 id=&quot;the-web&quot;&gt;The Web &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#the-web&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;We&#39;re well over 7000 words already, but I do just want to briefly touch
on the topic of the Web. The Web has a number of properties that
do make the problem somewhat easier:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Web programs execute in the browser, which serves as the trusted
computing base.&lt;/li&gt;
&lt;li&gt;There&#39;s a clear way to identify the &amp;quot;program&amp;quot; the user is trying
to run, which is to say the &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin&quot;&gt;origin&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Web applications are (mostly) not compiled but rather HTML and
JavaScript, which are (again mostly) readable, which might
make the problem of reproducibility easier.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, it also has several important properties that make the problem
much harder:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Web application is individually downloaded directly from the
publisher by each user, often after they have been authenticated,
making it very easy to mount a targeted attack.&lt;/li&gt;
&lt;li&gt;It&#39;s very common to send each user a slightly different Web page,
for instance if there is personalized content, which makes the
question of whether it&#39;s the same program very difficult.&lt;/li&gt;
&lt;li&gt;Authors of Web applications often change the application very
frequently, either deploying as soon as changes are made
(&amp;quot;continuous deployment&amp;quot;) or for experimentation purposes
(A/B testing), which means there are a lot of different
versions floating around even without personalization.&lt;/li&gt;
&lt;li&gt;Web pages often consist of a lot of pieces of JavaScript from
various servers (e.g., all the ads that are displayed on the
page). This JavaScript is part of the application and so has
to be validated somehow, but in many cases it&#39;s not even
meaningfully under the control of the Web site.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Probably the most serious attempt to provide binary transparency for
Web applications, is Facebook&#39;s &lt;a href=&quot;https://www.facebook.com/help/messenger-app/799550494558955&quot;&gt;Code
Verify&lt;/a&gt;)
system, provided in &lt;a href=&quot;https://blog.cloudflare.com/cloudflare-verifies-code-whatsapp-web-serves-users/&quot;&gt;collaboration with
Cloudflare&lt;/a&gt;.
Code Verify works by the user installing a &lt;a href=&quot;https://chromewebstore.google.com/detail/code-verify/llohflklppcaghdpehpbklhlfebooeog?hl=en&quot;&gt;browser
extension&lt;/a&gt;
which checks that code running on WhatsApp, Facebook, Instagram, and Messenger
matches the source of truth known to Cloudflare. This is a good start
but it&#39;s also fairly far away from being globally usable.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As should be clear at this point, the situation is fairly dire:
if you&#39;re running software written by someone else—which
basically everyone is—you have to trust a number of different
actors. We do have some technologies which have the potential to
reduce the amount you have to trust them, but we don&#39;t really
have any plausible venue to reduce things down to the level where
there aren&#39;t a number of single points of trust.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
This doesn&#39;t mean we should succumb to security nihilism: there&#39;s
still plenty of room for improvement and we know how to make
some of those improvements. However, this isn&#39;t a problem that&#39;s
going to get solved any time soon. Open source, audits, reproducible builds, and
binary transparency are all good, but they don&#39;t eliminate the
need to trust whoever is providing your software and you
should be suspicious of anyone telling you otherwise.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt; Designer
of &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8555&quot;&gt;ACME&lt;/a&gt; and
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9420&quot;&gt;MLS&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Apparently on Windows if
the binary is signed with an Extended Validation certificate,
then you &lt;a href=&quot;https://cheapsslsecurity.com/blog/a-primer-on-how-code-signing-works/&quot;&gt;don&#39;t get a dialog at all&lt;/a&gt;,
and the binary just runs. &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I should also mention that requiring code signing allows the
platform vendor to centrally control who is allowed to write
programs for their platform and what those programs are allowed
to do. This kind of control can be used in ways that protect
users from malware or just software that has user-hostile
behaviors but can also be used to restrict user choice.
How big a deal this is depends on how hard the platform makes
it to run unsigned programs. iOS is a good comparison point here,
where Apple requires that (1) all software be installed from the
app store and that (2) that software comply with &lt;a href=&quot;https://developer.apple.com/app-store/review/guidelines/&quot;&gt;Apple&#39;s rules&lt;/a&gt;,
with the result being that you just can&#39;t run any software at
all that Apple doesn&#39;t approve of, whether that&#39;s pornography,
encouraging smoking, or just using a browser engine other than
WebKit (except in Europe where you sort of can as long as you
jump through a lot of hoops). &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For instance, a GlobalSign code signing certificate is $289/yr,
though you have to establish your company first. &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Many systems allow you to distribute a package &amp;quot;lock&amp;quot; file that
contains both the versions and a hash of the package, thus
guaranteeing that any dependencies will be exactly what the
original programmer expected. This obviously has some side
effects, and practice around use of lock files varies.
 &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For instance, &lt;a href=&quot;https://pypi.org/project/pandas/&quot;&gt;pandas&lt;/a&gt;,
&lt;a href=&quot;https://pypi.org/project/numpy/&quot;&gt;numpy&lt;/a&gt;, &lt;a href=&quot;https://pypi.org/project/torch/&quot;&gt;pytorch&lt;/a&gt;,
&lt;a href=&quot;https://pypi.org/project/requests/#requests-2.32.3.tar.gz&quot;&gt;requests&lt;/a&gt;.
It&#39;s a little hard to get hard numbers here, but PyPi says that
they are &lt;a href=&quot;https://pypi.org/&quot;&gt;hosting almost 600,000 packages&lt;/a&gt;
and the Trailofbits &lt;a href=&quot;https://blog.trailofbits.com/2024/11/14/attestations-a-new-generation-of-signatures-on-pypi/&quot;&gt;announcement of signatures&lt;/a&gt;
from November 2024 says that just under 20,000 packages use the &amp;quot;trusted publishing&amp;quot;
workflow, which seems to be the main workflow for signing. &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Just to get this out of the way, if you&#39;ve taken automata
theory, you know that it&#39;s not possible to mechanically
determine the behavior of every program, even for trivial
properties like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Halting_problem&amp;amp;oldid=1239849679&quot;&gt;does it run forever&lt;/a&gt;,
but most of those situations arise with specially contrived
programs. I&#39;m talking here about perfectly ordinary programs
which you could figure out if you had enough time. &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For example, when I was at Mozilla we just imported several hundred thousand lines
of &lt;a href=&quot;https://webrtc.org/&quot;&gt;WebRTC&lt;/a&gt; code from Google as part of its
WebRTC implementation, and nobody thought we were going to really
review all that code. &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, it doesn&#39;t appear that Chromium builds are reproducible
and Chrome itself has some proprietary components, which makes
verifying the whole system problematic. Firefox builds aren&#39;t reproducible either, although the Firefox-derived
Tor Browser builds &lt;a href=&quot;https://blog.torproject.org/deterministic-builds-part-two-technical-details/&quot;&gt;are&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There&#39;s not even a remotely plausible story about not needing to
trust anyone, but that&#39;s true for mostly everything in life. &lt;a href=&quot;https://educatedguesswork.org/posts/ensuring-software-provenance/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Overloaded fields, type safety, and you</title>
		<link href="https://educatedguesswork.org/posts/text-type-safety/"/>
		<updated>2024-08-19T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/text-type-safety/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/boblnu-inline.jpg&quot; alt=&quot;Bob LNU&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Image by Kate Hudson with help from Photoshop AI
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I recently learned that Southwest has a
&lt;a href=&quot;https://www.washingtonpost.com/travel/2024/08/02/southwest-fat-passengers-policy/&quot;&gt;policy&lt;/a&gt;
of giving passengers who don&#39;t fit in a single seat a free second seat.
This isn&#39;t an issue for me personally, but I was curious how it worked
and that lead me to Southwest&#39;s &lt;a href=&quot;https://support.southwest.com/helpcenter/s/article/How-do-I-book-an-additional-ticket-for-a-Customer-of-size&quot;&gt;page&lt;/a&gt; on how to book a second seat:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/extra_seat_whos_flying.jpg&quot; alt=&quot;Southwest seat selection&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Southwest&#39;s passenger entry field
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Here are the instructions:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Complete the &amp;quot;Who&#39;s Flying?&amp;quot; name fields for a Customer of size as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Without a middle name: A Passenger named Tom Smith would designate Passenger One as &amp;quot;Tom Smith,&amp;quot; and Passenger Two as &amp;quot;Tom XS Smith&amp;quot; (first name Tom, middle name XS, and last name Smith).&lt;/li&gt;
&lt;li&gt;With a middle name: A Passenger named Tom James Smith would designate Passenger One as &amp;quot;Tom James Smith,&amp;quot; and Passenger Two as &amp;quot;Tom James XS Smith&amp;quot; (first name Tom, middle name James XS, and last name Smith).&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;What&#39;s happening here will be instantly familiar to anyone with
programming experience: Southwest&#39;s systems aren&#39;t set up to carry the
information that someone wants a spare seat and so instead they have
shoehorned the information into the passenger&#39;s middle name.
This kind of thing happens all the time in software engineering,
and is often necessary when you find yourself in a tricky situation,
but can also lead to a number of different kinds of problems.&lt;/p&gt;
&lt;h2 id=&quot;some-examples&quot;&gt;Some Examples &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#some-examples&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In my experience the most common way to get yourself into this
kind of situation is when you have to deal with some system
you can&#39;t change, and especially when you have a sandwich
like the figure below where you have two components you can
change with a component you &lt;em&gt;can&#39;t&lt;/em&gt; change in between.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ComponentSandwich.png&quot; alt=&quot;A component sandwich&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Something you can&#39;t change in between two things you can.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For example, in the case of Southwest, mostly likely the problem isn&#39;t
the Web page itself; it&#39;s quite easy to modify the form to add an
extra checkbox or something. Similarly, there is eventually some
system that knows about the extra seat policy. But almost certainly
there is some back-end system somewhere which doesn&#39;t know about the policy
and doesn&#39;t have room for an
extra field (e.g., some database with a fixed set of columns) and so
the easiest thing to do is to smuggle the information in the middle name field
and then have the system you have some other system which does know about the policy
and is able to pick out names with &amp;quot;XS&amp;quot;.&lt;/p&gt;
&lt;p&gt;You don&#39;t have to look very far to find plenty of other examples, both
in software and in the real world.&lt;/p&gt;
&lt;h3 id=&quot;hi%2C-i&#39;m-bob-nln&quot;&gt;Hi, I&#39;m Bob NLN &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#hi%2C-i&#39;m-bob-nln&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The form above assumes that everyone has a first and last name (hence the
red *s indicating it&#39;s mandatory) even if you don&#39;t have a middle name.
However, many people only have one name (Afghans, Indonesians, Grimes, ...),
so what do you do when the form insists you put something in both fields.
Depending on the design of the form validation logic, there are various
options, but the one commonly used on official forms is to use either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;NFN&lt;/em&gt; for No First Name or &lt;em&gt;FNU&lt;/em&gt; for First Name Unknown&lt;/li&gt;
&lt;li&gt;&lt;em&gt;NLN&lt;/em&gt; for No Last Name or &lt;em&gt;LNU&lt;/em&gt; for Last Name Unknown&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, if you just have one name, is it the first or the last name?
For the US passport, at least, you provide single name as your
&lt;a href=&quot;https://travel.state.gov/content/travel/en/us-visas/visa-information-resources/forms/ds-160-online-nonimmigrant-visa-application/ds-160-faqs.html&quot;&gt;surname&lt;/a&gt; and use FNU for your
Given Name. As an aside, people sometime use &amp;quot;No Last Name&amp;quot;, but
then you occasionally fall afoul of form validation logic which
won&#39;t allow embedded spaces. This is also bad news for people
with multiple word non-hyphenated last names.&lt;/p&gt;
&lt;h3 id=&quot;credit-card-pans-and-format-preserving-encryption&quot;&gt;Credit Card PANs and Format-Preserving Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#credit-card-pans-and-format-preserving-encryption&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The next one is a little more complicated. Suppose you have a database
which stores credit card numbers (technical term: &lt;em&gt;Payer Account
Number (PAN)&lt;/em&gt;) or social security numbers. It&#39;s good practice to have
the database validate the number and lots of software that uses
the database will also rely on the number having a &lt;a href=&quot;https://www.forbes.com/advisor/credit-cards/what-does-your-credit-card-number-mean/&quot;&gt;given structure&lt;/a&gt;. For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first digit of the PAN indicates the type of card (4 for visa, 5 for MasterCard, etc.)_&lt;/li&gt;
&lt;li&gt;Some of the next 5 digits indicate the issuing bank&lt;/li&gt;
&lt;li&gt;There is a check digit (the last digit in most cards, digit 13 for Visa).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Suppose you want to give access to the database to someone who you
don&#39;t entirely trust but needs to do some analysis. You could just
remove the card numbers, but maybe you want to be able to detect PANs
which are duplicated across users (this is even more relevant for
SSNs). One way to do this is to encrypt the PAN,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;  but here we run into a technical limitation in the
design of our cryptographic algorithms.&lt;/p&gt;
&lt;p&gt;The basic encryption primitive you have to work with here is what&#39;s
called a &amp;quot;block cipher&amp;quot;, which operates on binary blocks of size
2&lt;sup&gt;n&lt;/sup&gt;, typically 64 bits or 128 bits. The way that a block
cipher works is that it&#39;s a mapping from input blocks to output
blocks, with each key producing a different mapping. In other words:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Encrypt(K, Plaintext) → Ciphertext&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If the PAN is 16 digits long, then it&#39;s easy to map it into a
128-bit block, just use one digit per byte.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
The problem is that the ciphertext is then evenly distributed
over the space of binary blocks, which means that you&#39;re quite likely
to end up with a value which isn&#39;t a valid PAN,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
for instance, it might contain letters or unprintable characters.
When you take that ciphertext and try to insert it into your database,
it will fail the database validity checks, which creates an obvious
problem.&lt;/p&gt;
&lt;p&gt;The solution is something called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Format-preserving_encryption&amp;amp;oldid=1180513669&quot;&gt;format preserving
encryption (FPE)&lt;/a&gt;,
which is effectively a block cipher which works on
arbitrary-sized blocks instead of blocks that are powers
of 2. This allows you to encrypt from inputs that look
like PANs into outputs that also look like PANs, and therefore
can be inserted into the database, just like regular PANs.&lt;/p&gt;
&lt;h3 id=&quot;tls-extensions-and-scsvs&quot;&gt;TLS Extensions and SCSVs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#tls-extensions-and-scsvs&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This kind of hackery doesn&#39;t just happen in text files and databases;
we do it all the time in network protocols.
The way that TLS negotiation works is that the client sends an initial
&lt;code&gt;ClientHello&lt;/code&gt; message to the server. In the predecessor protocol to
TLS, &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc610&quot;&gt;SSLv3&lt;/a&gt;, was
designed, &lt;code&gt;ClientHello&lt;/code&gt; looked like this&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    struct {
        ProtocolVersion client_version;
        Random random;
        SessionID session_id;
        CipherSuite cipher_suites&amp;lt;2..2^16-1&amp;gt;;
        CompressionMethod compression_methods&amp;lt;1..2^8-1&amp;gt;;
    } ClientHello;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The important values here for our purposes are:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;client_version&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;A two byte version number reflecting the highest version the client
supports.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;cipher_suites&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;A list of two byte values indicating which algorithms the client
supports. Each suite reflects all the algorithms that will be
used for the connection (e.g., signature, key exchange, encryption, etc.).&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The server responds with a &lt;code&gt;ServerHello&lt;/code&gt; message which selects a specific
version and cipher for the connection:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    struct {
        ProtocolVersion server_version;
        Random random;
        SessionID session_id;
        CipherSuite cipher_suite;
        CompressionMethod compression_method;
    } ServerHello;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One thing to notice is that this is not a very flexible structure;
for instance if you wanted to (for instance) say that you
wanted the server to send packets no bigger than a certain size,
there would be no place to do it. You&#39;ll notice the echoes
here of the discussion above about having a format that is
inflexible and then wanting to extend it.&lt;/p&gt;
&lt;p&gt;When TLS 1.0 was standardized, however, the designers noticed that
there actually was a place that had some flexibility. Each handshake
message is carried in an outer wrapper which looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    struct {
        HandshakeType msg_type;
        uint24 length;
        ... // The message itself
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This wrapper servers two purposes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It lets you identify which message you are receiving, because
there are parts of the handshake state machine where the peer
can send more than one message and you need to know which one
it is.&lt;/li&gt;
&lt;li&gt;It allows you to have a single function which reads the entire
handshake message (using &lt;code&gt;length&lt;/code&gt;) that works for every message
type and then you can hand the whole message off to a different
per-message function.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, this also creates a situation in which the body of the
handshake message (i.e., the next &lt;code&gt;length&lt;/code&gt; bytes on the wire)
can be inconsistent with what the message was supposed to be.
For example, imagine that the client sent a message which was
nominally a &lt;code&gt;ClientHello&lt;/code&gt; but was only two bytes long. Obviously
that&#39;s not valid, and the server needs to detect it and fail.
On the other hand, it&#39;s also possible for the message to be too
long, which is to say that there are trailing bytes after you&#39;ve
consumed everything in the handshake structure. This is also
an encoding error, but it&#39;s one that&#39;s survivable because you
already have the data you need. The TLS 1.0 designers
noticed this too and decided to make a &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc2246#section-7.4.1.2&quot;&gt;special exception&lt;/a&gt;
for &lt;code&gt;ClientHello&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the interests of forward compatibility, it is permitted for a
client hello message to include extra data after the compression
methods. This data must be included in the handshake hashes, but
must otherwise be ignored. This is the only handshake message for
which this is legal; for all other messages, the amount of data
in the message must match the description of the message
precisely.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A subsequent &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc3546&quot;&gt;specification&lt;/a&gt;
provides some actual rules about what was allowed to go in this section,
namely a list of &amp;quot;extensions&amp;quot; formatted in tag-length-value format:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  struct {
      ProtocolVersion client_version;
      Random random;
      SessionID session_id;
      CipherSuite cipher_suites&amp;lt;2..2^16-1&amp;gt;;
      CompressionMethod compression_methods&amp;lt;1..2^8-1&amp;gt;;
      Extension client_hello_extension_list&amp;lt;0..2^16-1&amp;gt;;
  } ClientHello;

  struct {
      ExtensionType extension_type;
      opaque extension_data&amp;lt;0..2^16-1&amp;gt;;
  } Extension;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because extensions are typed, this is a general extensibility
mechanism and you can always add new stuff just by adding new
&lt;code&gt;extension_type&lt;/code&gt; code points.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;intolerance&quot;&gt;Intolerance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#intolerance&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;So everything is great, right? Well, not quite. SSLv3 was first
deployed in 1996 and TLS 1.0 was published in 1999 the definition of
extensions in 2003. This meant that by the time TLS 1.0 was deployed,
there were a lot of SSLv3 servers in the field and not all of them
accepted more modern &lt;code&gt;ClientHello&lt;/code&gt; messages. There are at least two
sources of intolerance:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Not liking any version number different from that for SSLv3 (which is &lt;code&gt;0x0030&lt;/code&gt;, as it happens)&lt;/li&gt;
&lt;li&gt;Not liking trailing bytes in ClientHello (this actually isn&#39;t too
surprising, as what to do in this case was a bit ambiguous in SSLv3).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In either case the server would generate an error (maybe a TLS &lt;code&gt;Alert&lt;/code&gt;
or maybe just cling the connection). This creates a compatibility problem
when a client sends a modern &lt;code&gt;ClientHello&lt;/code&gt; to one of these servers
and it rejects it.&lt;/p&gt;
&lt;h3 id=&quot;fallback-and-downgrade-attacks&quot;&gt;Fallback and Downgrade Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fallback-and-downgrade-attacks&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In order to deal with this, some TLS clients used a technique called
fallback in which they reconnect with older version &lt;code&gt;ClientHello&lt;/code&gt; after
a newer one fails, like so:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-fallback2.png&quot; alt=&quot;TLS Fallback&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS Fallback to SSLv3 without extensions
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The problem here is that this process is insecure because the
attacker can forge an error and force you to reconnect, like so:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-fallback2-attack.png&quot; alt=&quot;TLS Downgrade Attack&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS downgrade attack via fallback
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If the
newer version of TLS is more secure than the older version,
then the attacker just forced you to use a less secure protocol.
This is called a &amp;quot;downgrade attack&amp;quot;.
Of course clients could have just decided not to do any fallback,
but that would have made it so they couldn&#39;t connect to those
old server, which the client vendors didn&#39;t want to do.&lt;/p&gt;
&lt;h3 id=&quot;scsvs&quot;&gt;SCSVs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#scsvs&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;What you need is some way to distinguish attacks from extension- or
version-intolerant servers. The problem is that neither alerts nor TCP
closures are authenticated, and there&#39;s no straightforward way to
authenticate them. Instead, what we want is some way for
the client to safely signal that it supports modern TLS in a way
that doesn&#39;t trigger older servers. The options are fairly limited,
but there &lt;em&gt;is&lt;/em&gt; a field that is safe to use, the &lt;code&gt;cipher_suite&lt;/code&gt;
list.&lt;/p&gt;
&lt;p&gt;Recall that the semantics of this list are that the client
provides some ciphers and the server picks one, so servers are
used to seeing ciphers they don&#39;t recognize them and just ignore
them. All we have to do is define a new cipher that means &amp;quot;I am actually
a modern client&amp;quot;. TLS calls this a &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7507&quot;&gt;Signaling Cipher Suite Value (SCSV)&lt;/a&gt;. The way this works is that when the client falls back to
an older TLS version it also includes this value; if the server
sees the SCSV cipher suite and it supports a newer version of TLS,
it rejects the connection.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-fallback2-scsv.png&quot; alt=&quot;TLS with SCSV&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Defending against a TLS downgrade attack with an SCSV
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Note that this doesn&#39;t stop the attacker from blocking the connection
from happening—that&#39;s not really possible if the attacker controls
the network—it just prevents them from forcing the connection
down to a weaker version.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
This isn&#39;t a perfect defense because servers have to deploy the
SCSV and not all of them have, but it&#39;s better than nothing.
Even better, of course, would be not to fall back, which is what
browser clients eventually did.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;random-signaling-values&quot;&gt;Random Signaling Values &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#random-signaling-values&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;TLS 1.3 also does some other signaling shenanigans where
it overloads the &lt;code&gt;Random&lt;/code&gt; value in the &lt;code&gt;ServerHello&lt;/code&gt; in order to signal some
special conditions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;That a &lt;code&gt;ServerHello&lt;/code&gt; is actually a special message called &lt;code&gt;HelloRetryRequest&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;That the &lt;code&gt;Server&lt;/code&gt; supported TLS 1.3 even though it
received a TLS 1.2 &lt;code&gt;ClientHello&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is actually a case where these are technically valid &lt;code&gt;Random&lt;/code&gt; values,
but are just extremely unlikely to be generated by accident.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;When TLS 1.3 was being designed, we discovered that there was a
nontrivial number of servers which didn&#39;t support version number
1.3 (despite accepting number 1.2). The TLS WG decided to address
this by designing an entirely new version negotiation scheme,
ironically based on extensions. The idea here was that as long
as the server handled extensions properly it would
be safe to offer the new version extension. Of course, if
you don&#39;t handle extensions properly, you&#39;re back in the soup,
but the normal process of upgrading eventually got the fraction
of servers which didn&#39;t support extensions low enough that
we were able to &lt;a href=&quot;https://datatracker.ietf.org/doc/rfc8996/&quot;&gt;deprecate TLS versions below 1.2&lt;/a&gt;
and for browsers to disable the fallback mechanism.&lt;/p&gt;
&lt;h2 id=&quot;overloading&quot;&gt;Overloading &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#overloading&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The common thread in all of these cases is that we have taken a
field that has one meaning and overloaded it with another meaning.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Case&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Original Meaning&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Alternate Meeting&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Name field&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Actual name&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;No name&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Credit cards&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;PAN&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Encrypted PAN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;TLS Cipher suites&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Cipher suite&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;TLS Random&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Random nonce&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Fallback, Alternate message&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The problem here is that the values associated with the alternate
meaning would actually be valid for the original meaning (that&#39;s
why the trick works in the first place). What you&#39;re relying on
is that the values for the alternate meaning don&#39;t &lt;em&gt;actually&lt;/em&gt;
overlap with those for the original meaning, so, for instance,
the SCSV cipher suite has actually been reserved, so there&#39;s
actually no chance that it will happen accidentally,
and the random values have a statistically very low chance of
collision, but there are also real world cases where this kind of overloading causes
serious problems.&lt;/p&gt;
&lt;h3 id=&quot;no-plate&quot;&gt;NO PLATE &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#no-plate&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Probably one of the best real-world examples here comes from license plates.
&lt;a href=&quot;https://www.snopes.com/fact-check/auto-no-plate/&quot;&gt;Snopes&lt;/a&gt; has the full
story of a guy named Robert Barbour who registered for a set of vanity
plates providing the options &amp;quot;SAILING&amp;quot;, &amp;quot;BOATING&amp;quot;, and &amp;quot;NO PLATE&amp;quot; indicating
that he didn&#39;t want a vanity plate if the first two options weren&#39;t
available. Instead, the DMV sent him &amp;quot;NO PLATE&amp;quot; (ambiguity one).
Even better, it turned out that San Francisco police officers used
plate number &amp;quot;NO PLATE&amp;quot; to ticket cars which didn&#39;t have plates (ambiguity two),
and he started getting tickets. The story doesn&#39;t end there, though:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;couple of years later, the DMV finally caught on and sent a notice
to law enforcement agencies requesting that they use the word NONE
rather than NO PLATE to indicate a cited vehicle was missing its
plates. This change slowed the flow of overdue notices Barbour
received to a trickle, about five or six a month, but it also had
an unintended side effect: Officers sometimes wrote MISSING instead
of NONE to indicate cars with missing license plates, and suddenly
a man named Andrew Burg in Marina del Rey started receiving parking
tickets from places he hadn&#39;t visited either. Burg, of course, was
the owner of a car with personalized plates reading &amp;quot;MISSING.&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This turns out to happen fairly often, with different theoretically
invalid values in the place of &amp;quot;NO PLATE&amp;quot;, such as &amp;quot;VOID&amp;quot;, &amp;quot;UNKNOWN&amp;quot;,
or &amp;quot;XXXXXX&amp;quot;, because it turns out that they aren&#39;t &lt;em&gt;actually&lt;/em&gt; invalid,
or rather, they are regarded as invalid by one system (the police
officers who are reporting the error) but not invalid by another
system (the people requesting vanity plates and the order entry system
that accepts their proposed plates).&lt;/p&gt;
&lt;p&gt;It&#39;s tempting to think that the solution here is to have a single
defined invalid value (as with the SCSV example above), and there
are actually a number of systems which do have pre-determined
invalid values. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No social security number can have a field that is all zeros
(e.g., 123-00-1234)&lt;/li&gt;
&lt;li&gt;The phone number exchange &amp;quot;555&amp;quot; as in 415-555-1234 is reserved
for demonstration values.&lt;/li&gt;
&lt;li&gt;The domains &lt;code&gt;.example&lt;/code&gt; and &lt;code&gt;.invalid&lt;/code&gt; cannot be allocated
and so are used for examples.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, it turns out that this isn&#39;t enough.&lt;/p&gt;
&lt;h3 id=&quot;malloc()-return-values&quot;&gt;&lt;code&gt;malloc()&lt;/code&gt; return values &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#malloc()-return-values&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=C_(programming_language)&amp;amp;oldid=1240704877&quot;&gt;C&lt;/a&gt;,
the way that you allocate memory on the heap is to use the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=C_dynamic_memory_allocation&amp;amp;oldid=1217723593&quot;&gt;&lt;code&gt;malloc()&lt;/code&gt;&lt;/a&gt; function, as in:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Foo *tmp = malloc(sizeof Foo)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This allocates an object of the size of the object &lt;code&gt;Foo&lt;/code&gt; and then
assigns it to &lt;code&gt;tmp&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt; But what happens when there isn&#39;t enough memory so the call
to &lt;code&gt;malloc()&lt;/code&gt; fails? The answer is that it returns a zero valued
pointer. The correct code here is:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;    Foo &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;tmp &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt; Foo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;tmp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;       &lt;span class=&quot;token function&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This of course works fine, but nothing actually makes you check, so
what happens if you forget. In that case you end up with what&#39;s called
a &amp;quot;null pointer&amp;quot; and if you try to use it you get what&#39;s called a
&amp;quot;null pointer dereference&amp;quot; (what Tony Hoare called a
&lt;a href=&quot;https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/&quot;&gt;billion dollar mistake&lt;/a&gt;).
In the best case, this will crash your
program; in the worst case it&#39;s an &lt;a href=&quot;https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html&quot;&gt;exploitable vulnerability&lt;/a&gt;.
The problem here is that even though the invalid value (0) is easily
detectable, you have to actually check it. A better situation is
one where it&#39;s not actually possible to end up with an invalid value.&lt;/p&gt;
&lt;h3 id=&quot;infallible-allocation-and-new&quot;&gt;Infallible allocation and &lt;code&gt;new&lt;/code&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#infallible-allocation-and-new&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One approach is to have the function be what my Mozilla co-workers
used to call &amp;quot;infallible&amp;quot;, which is to say that it can&#39;t return
an invalid value. Instead, the program crashes. For instance, you
could have a function called &lt;code&gt;safe_malloc()&lt;/code&gt; which looks like this:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;safe_malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;ptr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;ptr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;       &lt;span class=&quot;token function&quot;&gt;abort&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; ptr&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In C++, &lt;code&gt;malloc()&lt;/code&gt; (which allocates arbitrary memory)
is fallible, but the &lt;code&gt;new&lt;/code&gt; operator (which creates
objects) is infallible: if it fails to allocate the
memory the program will crash.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h4 id=&quot;union-types&quot;&gt;Union Types &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#union-types&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;An alternate approach is to have the function return
what&#39;s called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Union_type&amp;amp;oldid=1237408186&quot;&gt;union type&lt;/a&gt;,
which is a type that can contain multiple values, but
not all at once. For instance, consider the following
C code:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; U&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The way this works is that the union type is as big as the largest possible value it contains
(in this case a) but &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; share the same space in memory,
as shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/union-type.png&quot; alt=&quot;Union type&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
An example union type
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This isn&#39;t actually the solution to our problem, but instead
&lt;em&gt;recreates&lt;/em&gt; the problem. Suppose that I give you a pointer to an
instance of &lt;code&gt;U&lt;/code&gt; (the C notation is &lt;code&gt;U *&lt;/code&gt;), with the memory region it
points to being the bytes &lt;code&gt;[00, 01, 02, 03]&lt;/code&gt;. This could be either of
two things: the integer &lt;code&gt;0x00010203&lt;/code&gt; (I&#39;m assuming a big-endian
architecture) or the character &lt;code&gt;0x00&lt;/code&gt;. There&#39;s no way to tell from
context.&lt;/p&gt;
&lt;p&gt;What you actually need here is what&#39;s sometimes called a &lt;code&gt;discriminated union&lt;/code&gt;,
which also has a type field telling you what is inside it, like so:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; INT&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; CHAR &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; union_type&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; u&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; U&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The way this works is that the &lt;code&gt;union_type&lt;/code&gt; field tells you what&#39;s
inside the value. If we think about this in the memory allocation
context, we would have something like:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; MEMORY&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ERROR &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; union_type&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token keyword&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; error&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; u&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; MallocResult&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If &lt;code&gt;malloc&lt;/code&gt; succeeds, then it returns a &lt;code&gt;MallocResult&lt;/code&gt; of type
&lt;code&gt;MEMORY&lt;/code&gt; (i.e., &lt;code&gt;union_type&lt;/code&gt; is set to &lt;code&gt;MEMORY&lt;/code&gt;) and sets &lt;code&gt;result&lt;/code&gt; to
the allocated memory. If it fails it returns a &lt;code&gt;MallocResult&lt;/code&gt; of type
&lt;code&gt;ERROR&lt;/code&gt; and sets &lt;code&gt;error&lt;/code&gt; to the actual error value. To use this, you
would write code like this:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;MallocResult r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt; Foo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;type &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; ERROR&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token function&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;u&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Foo &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;tmp &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;u&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This solves the problem of knowing what the type of the result
is, but still doesn&#39;t really solve your problem because you
can just assume things are working, and have a null pointer
dereference anyway:&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;MallocResult r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;sizeof&lt;/span&gt; Foo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;Foo &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;tmp &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;u&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is about the best you can do in C, but more modern languages
have a safer structure. For instance, here is what the same union looks
like in Rust
(where it&#39;s called an &amp;quot;enum&amp;quot;):&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token class-name&quot;&gt;Result&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Memory&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token class-name&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Result&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This looks pretty similar but the important difference between
C and Rust in this case is that you&#39;re &lt;strong&gt;not allowed to access the values
directly&lt;/strong&gt;. The &lt;code&gt;enum&lt;/code&gt; keeps track of what is inside it and
Rust won&#39;t let you access the wrong type. Instead, you do something
like this:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Rust doesn&#39;t really have malloc&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;match&lt;/span&gt; result &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token class-name&quot;&gt;Result&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;memory&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;     &lt;span class=&quot;token comment&quot;&gt;// We have successful result in |memory|&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token class-name&quot;&gt;Err&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;     &lt;span class=&quot;token comment&quot;&gt;// Things failed with error |error|&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The point here is that the language protects you from screwing up:
you can only access the value that&#39;s actually in the union type,
not the other alternate values.&lt;/p&gt;
&lt;p&gt;The syntax above is a bit complicated and in practice, Rust has a
special type just for this called &lt;code&gt;Result&lt;/code&gt;.  &lt;code&gt;Result&lt;/code&gt; is set up so you
don&#39;t have to use the &lt;code&gt;match&lt;/code&gt; stuff above, but instead has
a function called &lt;code&gt;unwrap()&lt;/code&gt;, which works like this:&lt;/p&gt;
&lt;pre class=&quot;language-rust&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unwrap&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The way this works is that &lt;code&gt;malloc&lt;/code&gt; returns a &lt;code&gt;Result&lt;/code&gt; that contains
either the result of the call to &lt;code&gt;function()&lt;/code&gt;. If you call
&lt;code&gt;Result.unwrap()&lt;/code&gt; then one of two things happens:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the function succeeded, then &lt;code&gt;unwrap()&lt;/code&gt; will return the
function return value.&lt;/li&gt;
&lt;li&gt;If the function failed, then &lt;code&gt;unwrap()&lt;/code&gt; will terminate the program
(there is also a way to ask if it succeeded).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The technical term here for the property Rust is providing here is
&amp;quot;type safety&amp;quot;, with the compiler guaranteeing that you can&#39;t
misinterpret one type (error) as another (a result).&lt;/p&gt;
&lt;h2 id=&quot;excel-date-translation&quot;&gt;Excel Date Translation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#excel-date-translation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Type safety isn&#39;t just about exceptional cases where we have
one &amp;quot;main&amp;quot; meaning (e.g., the license plate) and one exceptional
meaning (there is no valid plate). There are many situations
where we have a field that can be of multiple types of actual
data (union types can of course be used this way). This is
conceptually powerful, but can lead to major problems, as
in Excel.&lt;/p&gt;
&lt;p&gt;Excel, like all spreadsheets, is structured as a set of cells.
Cells can contain freeform text data but can also contain other
more specific kinds of data (e.g., numbers, dates, etc.).
While you can specifically tell Excel what
type a field is (as with an option type), you usually don&#39;t,
because Excel can usually figure out what the field is from
what you type in. For instance, if you type in &lt;code&gt;1234&lt;/code&gt;, then
it&#39;s probably a number, and Excel will treat it accordingly;
If you type in &lt;code&gt;ABCD&lt;/code&gt;, then it&#39;s proably just freeform text;
and if you type in &lt;code&gt;2024-01-01&lt;/code&gt; then it&#39;s probably a date.&lt;/p&gt;
&lt;p&gt;This last case is where things can go spectacularly wrong
because that Excel is quite aggressive about
converting things to dates if they can plausibly be interpreted
that way. For instance, if you type &lt;code&gt;MARCH1&lt;/code&gt; into Excel it will
convert it to &lt;code&gt;Mar-1&lt;/code&gt;, which is to say &amp;quot;March 1st&amp;quot;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
Unfortunately,
it&#39;s quite common to name genes with strings that look
like dates, for instance one gene is named
&amp;quot;Membrane Associated Ring-CH-Type Finger 1&amp;quot; (MARCH1), which,
as noted above, Excel turns into &amp;quot;1-Mar&amp;quot;. This turns out to
be quite a pervasive problem in genomics, as documented by
by &lt;a href=&quot;https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7&quot;&gt;Ziemann, Eren, and El-Osta&lt;/a&gt;
in 2016:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The problem of Excel software (Microsoft Corp., Redmond, WA, USA)
inadvertently converting gene symbols to dates and floating-point
numbers was originally described in 2004 [1]. For example, gene
symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated
Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted
by default to ‘2-Sep’ and ‘1-Mar’, respectively. Furthermore,
RIKEN identifiers were described to be automatically converted to
floating point numbers (i.e. from accession ‘2310009E13’ to
‘2.31E+13’). Since that report, we have uncovered further
instances where gene symbols were converted to dates in
supplementary data of recently published papers (e.g. ‘SEPT2’
converted to ‘2006/09/02’). This suggests that gene name errors
continue to be a problem in supplementary files accompanying
articles. Inadvertent gene symbol conversion is problematic
because these supplementary files are an important resource in
the genomics community that are frequently reused. Our aim here
is to raise awareness of the problem.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The problem here, as so often happens in software, is that someone
tried to be smart. Also that friends don&#39;t let friends use spreadsheets.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;quoting&quot;&gt;Quoting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#quoting&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I want to hit one more related topic: quoting, starting with a simple
example, the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Comma-separated_values&amp;amp;oldid=1237715653&quot;&gt;comma separated value (CSV)&lt;/a&gt; file.&lt;/p&gt;
&lt;h3 id=&quot;comma-separated-values&quot;&gt;Comma Separated Values &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#comma-separated-values&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;CSV is a format for tabular data, which is to say a table of rows
and columns like a spreadsheet.
A CSV file consists of a series of rows, with each row on its own
line, separated by a newline character. Each row consists of
a set of columns, with the columns separated by commas, like so:&lt;/p&gt;
&lt;pre class=&quot;language-csv&quot;&gt;&lt;code class=&quot;language-csv&quot;&gt;&lt;span class=&quot;token value&quot;&gt;first name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;last name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;age&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token value&quot;&gt;John&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;Smith&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;20&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token value&quot;&gt;Jane&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;Doe&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;23&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token value&quot;&gt;Nicolas&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;Bourbaki&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;100&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a conceptually simple format, and you might think that it&#39;s
simple to parse: just go line by line and then split on the commas.&lt;/p&gt;
&lt;p&gt;But what happens if you want to have a field that itself contains
a comma, like so:&lt;/p&gt;
&lt;pre class=&quot;language-csv&quot;&gt;&lt;code class=&quot;language-csv&quot;&gt;&lt;span class=&quot;token value&quot;&gt;first name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;last name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;age&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token value&quot;&gt;John&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;Smith&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;20&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token value&quot;&gt;Jane&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;Doe&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;23&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token value&quot;&gt;Nicolas&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;Bourbaki&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;100&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token value&quot;&gt;Robert&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;Kennedy&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt; Jr.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token value&quot;&gt;70&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you split this on commas the last row will consist of four columns
rather than the three columns every other row has.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Robert&lt;/code&gt;, &lt;code&gt;Kennedy&lt;/code&gt;, &lt;code&gt;Jr.&lt;/code&gt;, &lt;code&gt;70&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Obviously, this is no good. The way you address this is by &amp;quot;quoting&amp;quot;
fields that contain the separator character by wrapping them
in quotes, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Robert,&amp;quot;Kennedy, Jr.&amp;quot;,70
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So far so good. But what happens if you have a field that itself
contains a quote? The typical answer is that you &lt;em&gt;escape&lt;/em&gt; it
by prefixing it with another character, such as backslash (&lt;code&gt;&#92;&lt;/code&gt;),
like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Elvis &#92;&amp;quot;The King&#92;&amp;quot;,Presley,42
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is called &amp;quot;escaping&amp;quot; and the backslash is called the &amp;quot;escape character&amp;quot;.
But this just pushes the problem around, because now we have the
problem of fields which contain backslash. The convention here
is that you represent those with a pair of backslashes (&lt;code&gt;&#92;&#92;&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;In other words:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;A,A&#92;&#92;B,C
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Represents the following three values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;A&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;A&#92;B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;C&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When you put all of this together, you can unambiguously parse the
file into fields and still represent any valid character inside
each field, but at the source of a lot of complexity in the parser.
It&#39;s not uncommon to see CSV parsing code just split on commas
and hope there aren&#39;t any fields with embedded commas. The source
of the problem is the same thing we&#39;ve been fighting all along,
namely that the comma has two meanings in this context:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;As a separator between fields&lt;/li&gt;
&lt;li&gt;As a character inside fields&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We need some way to distinguish between those two contexts, which
is what quoting does. But then we have to distinguish between
quotes around fields and quotes within field, hence the backslash
escape character. But now we have the same problem with the backslash,
hence double backslash.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
The easiest way out of this hole is to have a separator that isn&#39;t
valid inside a field, in which case you can just split on the
separator without any quoting, escaping, etc. Comma isn&#39;t a good
choice here because it&#39;s very common to have data that has embedded
commas, but what you&#39;ll often see used here is the tab character
(character code 9), in what&#39;s called a &lt;em&gt;tab separated value&lt;/em&gt; (TSV)
file. This is easier to work with because most data won&#39;t have tabs in
it at all and you can often replace tabs with spaces with no loss of
meaning. An additional benefit is that the tabs help align the
data so that columns will often line up properly.&lt;/p&gt;
&lt;h3 id=&quot;sql-injection&quot;&gt;SQL Injection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#sql-injection&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you incorrectly split up a CSV into fields, it&#39;s probably not that
bad—you&#39;ll probably end up with the wrong number of columns, which
is easily detectable—but there are cases where getting the quotes
wrong can be much worse.&lt;/p&gt;
&lt;p&gt;The most common tool for interacting with databases is a language called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=SQL&amp;amp;oldid=1238737606&quot;&gt;SQL&lt;/a&gt;.
For instance, you might ask for every row in a database where someone
had the first name &amp;quot;John&amp;quot; like so:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT * FROM Table WHERE FirstName=&#39;John&#39;;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note the single quotes around &lt;code&gt;&#39;John&#39;&lt;/code&gt;. The reason for these is that
you&#39;re using &lt;em&gt;spaces&lt;/em&gt; as separators and you might want to search
for a field value with embedded spaces, which you would do like
&lt;code&gt;Jim Bob&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now consider the case where you have a Web interface and you want
to look up a user by name, as in the passenger entry field we started
with at the top of this post: the user puts in their name and you
want to look up their passenger record, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT * FROM Passengers WHERE FirstName=&#39;John&#39; AND LastName=&#39;Smith&#39;
  AND DOB=&#39;1986-01-01&#39;;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, if you&#39;re building a Web application, you&#39;re not really
programming in SQL. Instead, you&#39;re working in some other language,
such as Python, and then using it to execute SQL, like this:&lt;/p&gt;
&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;cursor&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;execute&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;SELECT * FROM Passengers WHERE LastName=&#39;Smith&#39;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice how I&#39;ve wrapped the SQL command in double quotes to
tell Python &amp;quot;this is all one string&amp;quot; and Smith in single quotes
to tell SQL &amp;quot;this is all one field&amp;quot;. This is an alternative to
escaping for dealing with situations where you have embedded
quotes in some field, at least in languages which allow both
single and double quotes.&lt;/p&gt;
&lt;p&gt;But of course I don&#39;t know the name that I want to search for in
advance, as it&#39;s entered by the user. So instead what the Web
app has to do is read the name the user entered in and then
assemble it into an SQL command, something like this:&lt;/p&gt;
&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;command &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;SELECT * FROM Passengers WHERE LastName=&#39;&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; name &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&#39;;&quot;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;cursor&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;execute&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;command&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, the name is in the variable &lt;code&gt;name&lt;/code&gt; and we insert it into
the command template to form the actual SQL command we want to send to
the database to execute.&lt;/p&gt;
&lt;p&gt;But now what happens if the name the user enters &lt;strong&gt;contains a quote&lt;/strong&gt;,
like, for instance &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=The_d%27Artagnan_Romances&amp;amp;oldid=1240664035&quot;&gt;d&#39; Artagnan&lt;/a&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;
In that case we get this SQL command:&lt;/p&gt;
&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; Passengers &lt;span class=&quot;token keyword&quot;&gt;WHERE&lt;/span&gt; LastName&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;d&#39;&lt;/span&gt; Artagnan&#39;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case the embedded quote in &amp;quot;d&#39; Artagnan&amp;quot; gets interpreted
as the end of the string to search for, which just becomes the
letter &amp;quot;d&amp;quot; (as you can see from the syntax coloring) and the
string &amp;quot;Artagnan&amp;quot; looks like the next bit of SQL, which (incorrectly)
ends in single quote. This particular example will likely just
create a syntax error in your SQL parser, because it&#39;s not a complete
SQL statement, but that if the attacker deliberately crafts
their name in order to be valid SQL. For instance they might enter
the following name:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Smith&#39;; DROP TABLES;&#39;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This produces the following string:&lt;/p&gt;
&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; Passengers &lt;span class=&quot;token keyword&quot;&gt;WHERE&lt;/span&gt; LastName&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;Smith&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLES&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which gets parsed as &lt;em&gt;three&lt;/em&gt; SQL commands, namely a select from
the database:&lt;/p&gt;
&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; Passengers &lt;span class=&quot;token keyword&quot;&gt;WHERE&lt;/span&gt; LastName&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;Smith&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;followed by a command which erases the entire &lt;code&gt;Passengers&lt;/code&gt; table.&lt;/p&gt;
&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; Passengers&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&#39;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Followed by some syntactically invalid SQL.&lt;/p&gt;
&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token string&quot;&gt;&#39;&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This last command is a syntax error
(though with some more cleverness we could make it valid)
but by this point the other commands
have executed and the &lt;code&gt;Passengers&lt;/code&gt; database has been erased.
We&#39;ve just
invented the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=SQL_injection&amp;amp;oldid=1240641360&quot;&gt;SQL injection&lt;/a&gt;
attack, which is a major problem in database-backed Web systems.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;a href=&quot;https://xkcd.com/327/&quot;&gt;&lt;img src=&quot;https://imgs.xkcd.com/comics/exploits_of_a_mom.png&quot; alt=&quot;Little Bobby Tables&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;From &lt;a href=&quot;https://xkcd.com/327/&quot;&gt;XKCD&lt;/a&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The root cause here isn&#39;t so much quoting as inconsistent quoting. Specifically,
the quote characters are special in SQL but get passed transparently through
the Web form and Python APIs we are dealing with—though they are
special in other contexts—as a result, the attacker is able to
get them all the way through to the database where they can cause damage.&lt;/p&gt;
&lt;p&gt;As it turns out, there is another attack in which the objective isn&#39;t
to contaminate the database but rather the victim&#39;s Web browser. In
this attack, called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Cross-site_scripting&amp;amp;oldid=1232455342&quot;&gt;cross-site scripting (XSS)&lt;/a&gt;,
the attacker submits some data (e.g., in a comment on a Facebook
post) that passes transparently through the site all the way to
some other person&#39;s browser when the read the comment, but instead
of displaying to the user, the browser instead interprets it as a piece
of JavaScript and executes it in the victim&#39;s browser. As with SQL
injection, XSS relies on constructing a special string that makes
the browser think that the user-generated content (the comment)
is over and that the rest of the text is JS, but in this case
the attacker needs to construct the string in such a way that
the database &lt;em&gt;doesn&#39;t&lt;/em&gt; interpret it but the browser does. Describing
how to do that is outside the scope of this post.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/text-type-safety/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We&#39;ve come a long way from reserving an extra seat on the plane, so I wanted to try to
see if I could pull things together. The underlying problem we are
facing here with all these examples is the same: having the same
set of bits which can mean two different things and needing
some way to distinguish those two meanings. Failure to do so
leads to ambiguity at best and serious defects at worst. That&#39;s
why you see so much emphasis in modern systems on type safety and on
strict domain separation between different meanings. In
the best case, it would simply be impossible to treat data
of type A as data of type B, but as a practical matter, you sometimes
have to do so; it&#39;s those times when extreme caution is warranted.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;You can also build a
giant lookup table of random values to PANs, a process often called
&amp;quot;tokenization&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The situation is more complicated with 64-bit blocks, but it&#39;s
obviously possible because there are many more 64 bit blocks than
16 bit credit card numbers. &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
What I mean here is that it&#39;s not even properly formatted, as
many properly formatted number strings are not valid
PANs. &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;People sometimes worry that the code
point space might run out, but the type field is two bytes and we&#39;re
nowhere near 65K extensions. Moreover, if we did get close we could
always define a new extension which contained other extensions! &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that if the oldest version of is weak enough (imagine
it wasn&#39;t authenticated at all) then this defense wouldn&#39;t
work, but at least so far even SSLv3 is strong enough
to prevent that bad an attack. &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: &lt;code&gt;malloc()&lt;/code&gt; returns an object of type &lt;code&gt;void *&lt;/code&gt;.
In C, &lt;code&gt;void *&lt;/code&gt; is automatically cast to a type of &lt;code&gt;T *&lt;/code&gt; for any
type &lt;code&gt;T&lt;/code&gt;, but in C++ it&#39;s not. In C++, the same code would
be &lt;code&gt;Foo *tmp = (Foo *)malloc(sizeof Foo)&lt;/code&gt;.
 &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technically it throws an exception which you can catch,
but I advise against this! &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In Rust you wouldn&#39;t actually be allowed to talk to raw memory,
but let&#39;s ignore that for now. &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
If you export to CSV you get 1-Mar. &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Google Sheets does some of the same stuff. &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The way to think about this is that there are really two backslashes,
the backslash literal and the escape character. We&#39;re trying to map
them onto one character in the text, but that flattening process
inevitable means that one or the other variant has to be bigger.
 &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The &lt;code&gt;SELECT *&lt;/code&gt; means &amp;quot;Give me every column&amp;quot;. I could get
just one column by doing &lt;code&gt;SELECT birthdate&lt;/code&gt;.
 &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, I know there is no space after the &amp;quot;d&#39;&amp;quot; but the example works better this way. &lt;a href=&quot;https://educatedguesswork.org/posts/text-type-safety/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>River of No Return 108K Race Report (2024)</title>
		<link href="https://educatedguesswork.org/posts/ronr-report/"/>
		<updated>2024-08-05T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ronr-report/</id>
		<content type="html">&lt;p&gt;My &amp;quot;A&amp;quot; races for 2024 were &lt;a href=&quot;https://educatedguesswork.org/posts/sob100k-2024&quot;&gt;Sean O&#39;Brien 100K&lt;/a&gt; at the
end of January and &lt;a href=&quot;https://www.aravaiparunning.com/tushars/&quot;&gt;Tushars 100K&lt;/a&gt;
at the end of July. 6 months is a long training block and so I decided
to break it up with something in between. I&#39;ve been leaning towards
mountainous races with a lot of vert lately (SOB notwithstanding) and after
doing a bunch of searching on UltraSignup I decided on the
&lt;a href=&quot;https://ronrenduranceruns.com/courses/100k/&quot;&gt;River of No Return 108K (RONR)&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ronr-report/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
Challis Idaho. This turned out to be a good call because I ended up bailing on
Tushars after the race was seriously impacted by a &lt;a href=&quot;https://inciweb.wildfire.gov/incident-information/utfif-silver-king-fire&quot;&gt;giant fire&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here&#39;s a long-delayed race report for RONR.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;badwater-crewing&quot;&gt;Badwater Crewing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#badwater-crewing&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Badwater logistics are nuts. Unlike most ultras which have aid stations, Badwater
is just an undifferentiated stretch of road and your crew has to
follow along in a van and can (mostly) crew you wherever you want.
They just pull over to the side of the road, feed you, etc., and you
keep going. It&#39;s also unbelievably hot and the crew is likely to
be pulling at least one all nighter themselves, if not two.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;RONR (pronounced row-nurr) is nominally 68 miles and ~17000 ft of
gain, with pretty much the whole race above 5000ft, so it seemed like
a good warm-up for Tushars (100K and 17000ft mostly above 9000
ft). The original plan was to do RONR as a &amp;quot;B&amp;quot; race without really
going to the well and then try to really focus on Tushars, but my
friend &lt;a href=&quot;https://brbrunning.com/&quot;&gt;Lisa&lt;/a&gt; asked me to crew her at
&lt;a href=&quot;https://www.badwater.com/event/badwater-135/&quot;&gt;Badwater 135&lt;/a&gt;, which is
the week before Tushars and the more I looked at the schedule the more
I realized that it was going to be tough to really land the taper for
Tushars, so we promoted RONR to at least an &amp;quot;A-&amp;quot;, meaning that we
would set up the training block for Tushars but I&#39;d still taper for
RONR and wouldn&#39;t hold back on the day.&lt;/p&gt;
&lt;p&gt;The training block leading up to RONR went really well and then in the
last mile of my last longish run—a week out so already into my
taper—I caught a toe and landed really hard on my right hand.
The next day the wrist was really swollen and I was worried I&#39;d
actually broken something, which would have obviously interfered
with racing—especially because you want to use poles on a race
like this and so you need to be able to push with your hand—but
an x-ray didn&#39;t turn up anything, so it was just ice, advil, and crossed
fingers.&lt;/p&gt;
&lt;h2 id=&quot;course-info&quot;&gt;Course Info &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#course-info&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ronr-map.png&quot; alt=&quot;RONR map&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/ronr-profile.png&quot; alt=&quot;RONR profile&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Screenshots from &lt;a href=&quot;https://runalyze.com/&quot;&gt;Runalyze&lt;/a&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Above is a map of the course along with a profile. There was quite
a bit of uncertainty about the actual amount of vert, with the
Web site showing a bunch of different values (17000 on the site itself,
&lt;a href=&quot;https://caltopo.com/m/7488&quot;&gt;15314 in CalTopo&lt;/a&gt;, &lt;a href=&quot;https://ultrapacer.com/course/665a7fd7c1357005e3fa0264?view=plan&amp;amp;plan=665fc876f4c25127395bf5ed&quot;&gt;16425 in ultraPacer with the
same GPX&lt;/a&gt;,
etc). Computing the amount of vert from a GPX is kind of a mess. As the
Runalyze guys &lt;a href=&quot;https://runalyze.com/activity/100140560/elevation-info&quot;&gt;say&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The calculation of elevation data is very difficult - there is not one single solution. Bad gps data can be corrected via srtm-data but these are only available in a 90x90m grid and not always perfectly accurate. In addition, every platform uses another algorithm to determine the elevation value (for up-/downwards). We give you therefore the possibility to choose algorithm and threshold such that the values fit your experience.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I figured it would be around 15000 ft or so and at the end of the day
Runalyze shows 14767 and Garmin 15439, so that seems to have been about
right. As you can see, it consists of four big climbs, up to around 10kft
and all above 5kft, with a really long final descent into town at the
end. The last descent is kind of a mixed blessing, with the last 5
miles being on actual asphalt, so you really don&#39;t have much of excuse
not to run, but you know it&#39;s not gonna be fun.&lt;/p&gt;
&lt;p&gt;Looking at past year&#39;s times I was struck by how slow they were: the
course record was set in 2021 by Jimmy Elam at 11:03, which is really
slow for a 100K (the SOB record is 8:24). Sometimes a slow course
record like this means a soft field, but not in this case: Jimmy Elam
was 14th at UTMB in 2022, doing 22:36 the same year Kilian Jornet did
19:39 (and &lt;a href=&quot;https://educatedguesswork.org/posts/utmb&quot;&gt;I did 37:49&lt;/a&gt;)
and so I knew it was going to be a long day, estimating between 16 and
18 hrs. I used ultraPacer to give me a pace sheet for 16:30,
which seemed on the optimistic side.&lt;/p&gt;
&lt;p&gt;The weather on the day was actually really good, but it was a close
thing: two weeks out there was a lot of snow on the course and then
the next 12 days or so were really hot, so the course was actually
almost snow free. When we drove in on Thursday it was unbelievably
hot but it cooled down on Friday and then Saturday was nice and
cool.&lt;/p&gt;
&lt;h2 id=&quot;travel&quot;&gt;Travel &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#travel&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Challis is not easy to get to. I had originally planned to fly to Salt
Lake and then drive (5+ hrs) but then decided instead to fly to Sun
Valley. Neither of these is ideal. There is no direct flight from SFO
to SUN on Friday so you have to fly in on Thursday. Normally this
isn&#39;t a big deal as you just chill out at the location, but if I&#39;m
racing at altitude I prefer to get in the night before to minimize the
crappy acute altitude adaptation phase that happens after 24 hrs or
so, and you obviously can&#39;t do that. On the other hand, if you fly
into SLC, then you can come in on Friday, but you get in super late,
which also isn&#39;t great.&lt;/p&gt;
&lt;p&gt;I stayed at one of the recommended hotels (the &lt;a href=&quot;https://www.challisvillageinn.com/lander&quot;&gt;Challis Village
Inn&lt;/a&gt;), which turns out to have been a great choice as it&#39;s
about a half mile away from the race start/finish. This meant we could
walk over in the morning without having to build in a lot of extra
time to deal with glitches around race day parking.&lt;/p&gt;
&lt;p&gt;Challis is a pretty typical small town, but just a heads-up if you&#39;re
thinking about doing RONR that the restaurant situation is pretty
limited: there are only a few places and most only have like one
vegetarian option (e.g., grilled cheese). Moreover, there&#39;s nothing
really open after 10 PM, so think about that when you plan for
your post-race meal. There is, however, a perfectly reasonable
grocery store, so it&#39;s not like you can&#39;t get food and cook for
yourself (my hotel room had a stove and a microwave).&lt;/p&gt;
&lt;h2 id=&quot;overall-logistics&quot;&gt;Overall Logistics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#overall-logistics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;My plan was to use the same food schedule as I had used for SOB,
namely Maurten and more Maurten. RONR serves Tailwind (which I
like OK) and Gu (which I don&#39;t love), so I decided to mostly
just carry stuff and use drop bags. There were only 3 drop
bag stations, so this meant carrying a bit more food than I
usually want, but it never got too heavy.&lt;/p&gt;
&lt;p&gt;My feed schedule is roughly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1 500 ml bottle of Maurten 160 drink every hour, with 250ml
each 30 min&lt;/li&gt;
&lt;li&gt;Some mix of Maurten solid and Maurten gel aiming for ~100-200
cal/hr.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I use a 30 minute timer to manage all this, so I have to do &lt;em&gt;something&lt;/em&gt;
every 30 minutes. I started with Maurten solid and then moved onto a
mix of regular gel and the caffeinated gels.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ronr-report/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ronr-food.jpeg&quot; alt=&quot;My food all laid out&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
That&#39;s a lot of Maurten
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;As before, I bagged up what I need for each aid station in a ziploc,
as well as a sort &amp;quot;spare food&amp;quot; bag just in case.&lt;/p&gt;
&lt;p&gt;RONR is a much more rugged race than SOB and due to all the snowmelt
I knew there would be a lot of water crossings, so I also had spare
socks in every drop bag as well as spare shoes in two of them in
case I wanted to change. In the event I only changed socks once
and kept the same shoes (&lt;a href=&quot;https://www.salomon.com/en-us/shop/product/s-lab-genesis-lg9299.html#color=87291&quot;&gt;Salomon S/LAB Genesis&lt;/a&gt; the whole time.&lt;/p&gt;
&lt;h2 id=&quot;start-to-birch-creek-%5B7.66-mi%2C-%2B2329%2F-358-ft%2C-1%3A33%3A33%5D&quot;&gt;Start to Birch Creek [7.66 mi, +2329/-358 ft, 1:33:33] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#start-to-birch-creek-%5B7.66-mi%2C-%2B2329%2F-358-ft%2C-1%3A33%3A33%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This first stretch is about 3 miles of rolling terrain followed by a
long climb (the AS is about 2/3 of the way up). That first three miles
is quite runnable and I was just trying to keep my pace contained, as
it&#39;s easy to get carried away at the start, especially after watching
the top pros just take off from the gun. This was made a bit easier by
the definitely feeling that I wasn&#39;t at my fastest at 5000 ft.&lt;/p&gt;
&lt;p&gt;I&#39;d decided to start without a headlamp because sunrise was shortly
after the start, so I had to be a bit careful, but you could
mostly follow other people&#39;s headlamps in the twilight until
the sun finally came up. I made it through this section OK
and then managed to trip and land on my right hand (again!),
but not badly enough to do more than make it more sore. This
was the first of two falls on course and the only one that was
more painful than embarrassing.&lt;/p&gt;
&lt;p&gt;The climbing started soon enough and I was able to switch to
hiking and poles. The trail itself was somewhere between double
track and fire road, so it&#39;s basically just a matter of putting
your head down and focusing on moving forward without having
to worry too much about your feet.&lt;/p&gt;
&lt;p&gt;This turned out to be my strongest section of the race, I think
due to a combination of several factors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Low temperatures&lt;/li&gt;
&lt;li&gt;Being fresh and so comfortable pushing&lt;/li&gt;
&lt;li&gt;Not having had a chance to fall behind on nutrition.&lt;/li&gt;
&lt;li&gt;Comparatively low altitude (Birch Creek is at 7000 feet).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By the time I hit the AS I was almost 24 minutes ahead of pace
for my already optimistic 16:30 target, and I was thinking I
was going to have a pretty good day. The next AS (Keystone)
wasn&#39;t far ahead, so I burned through the AS really quickly
(~30s).&lt;/p&gt;
&lt;h2 id=&quot;keystone-%5B3.56-mi%2C-%2B1444%2F-463-ft%2C-55%3A45%5D&quot;&gt;Keystone [3.56 mi, +1444/-463 ft, 55:45] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#keystone-%5B3.56-mi%2C-%2B1444%2F-463-ft%2C-55%3A45%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s mostly uphill from Birch Creek to Keystone and so just
more hiking. I don&#39;t remember much of this, except that it
went by fairly fast. I was still
moving well so I continued to be well ahead of schedule. This
stretch through Bayhorse was the furthest ahead of pace I ever was,
almost 30 minutes; if you multiply that 6 (I was 11 miles into
the race) we&#39;d be looking at almost 3 hrs ahead, but obviously
that wasn&#39;t going to happen. Another quick AS stop and then
the long descent to Bayhorse Lake and the
first drop bag.&lt;/p&gt;
&lt;h2 id=&quot;bayhorse-%5B4.26-mi%2C-%2B157%2F-2136-ft%2C-56%3A59%5D&quot;&gt;Bayhorse [4.26 mi, +157/-2136 ft, 56:59] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#bayhorse-%5B4.26-mi%2C-%2B157%2F-2136-ft%2C-56%3A59%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As you can see, this is a huge descent, which I mostly cruised.  I ran
this section with &lt;a href=&quot;https://ultrasignup.com/results_participant.aspx?fname=Kat&amp;amp;lname=Schuller&quot;&gt;Kat
Schuller&lt;/a&gt;,
a runner with &lt;a href=&quot;https://www.runinrabbit.com/blogs/rabbit-chatter/rabbitelitetrail-kat-schullers-story-of-running-while-trying-to-conceive-including-ivf?srsltid=AfmBOoqW3VMoQT6byfJLNygW9qR6fdPHbj6mpGrKCy0S_jSQrrjcgHSW&quot;&gt;Rabbit
Elite&lt;/a&gt;,
someone about my pace to chat with and just get through the
miles. Generally, in a race of this size I&#39;ll be somewhere near the
female podium (I was behind the first woman at &lt;a href=&quot;https://educatedguesswork.org/posts/sob100k-2024&quot;&gt;Sean
O&#39;Brien&lt;/a&gt; this year, though I would have been 8th
at RONR), so if I&#39;m with the elite women I generally figure I&#39;m pacing
about right. Kat&#39;s descending skills were a bit better than mine, so
she&#39;d drop me a bit on the trickier sections but I was able to just
push a little bit and catch up once it got smooth.&lt;/p&gt;
&lt;p&gt;We came into Bayhorse Lake together and arranged to meet up on the way
out for the next big climb once we&#39;d grabbed our drop bags, etc.
In the event, though, I needed to hit the bathroom and by the time
I came out and had my bottles filled, etc. I couldn&#39;t find
Kat and wasn&#39;t sure if she had left already or was still at the
AS (it turned out that she had decided she was in a race and had
just taken off, but I caught up to her later) I waited around for a minute or two and couldn&#39;t find her,
so headed out on my own. All in all, this was a really long AS stop;
I had budgeted for 6 minutes but it was almost 10. On the other
hand, I was still almost 30 minutes ahead.&lt;/p&gt;
&lt;h2 id=&quot;ramshorn-%5B9.67-mi%2C-%2B5000ft%2F-1250-ft%2C-2%3A58%3A32%5D&quot;&gt;Ramshorn [9.67 mi, +5000ft/-1250 ft, 2:58:32] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#ramshorn-%5B9.67-mi%2C-%2B5000ft%2F-1250-ft%2C-2%3A58%3A32%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next leg is a 5000+ climb, made more interesting by the fact that
it&#39;s also the first leg of the 32K, which started shortly after I left
Bayhorse. This meant initially I had the really fast people passing me,
but eventually things kind of stabilized as I caught up to people who
had gone out too hard.&lt;/p&gt;
&lt;p&gt;The Ramshorn aid station is almost 10 miles out and supposedly just
a water drop, so I had planned to carry 2l of water, but right
as I was about to leave Bayhorse I was told they had a water drop
part way up so I scaled back to 1.5. I don&#39;t remember this stretch that
clearly, so TBH I don&#39;t recall if Ramshorn was real aid or not.
I do, however, recall starting to drag as I got up above 8000
ft, and as Ramshorn is the high point of the course at ~10000ft,
that meant a long time working to breathe. Even so, I didn&#39;t
lose too much time on this section, hitting the top at about
22 minutes ahead of schedule.&lt;/p&gt;
&lt;p&gt;Towards to top of this climb I caught up to &lt;a href=&quot;https://ultrasignup.com/results_participant.aspx?fname=Lara&amp;amp;lname=Maccabee&amp;amp;age=22&quot;&gt;Lara Mccabee&lt;/a&gt;,
an Idaho local and former track athlete. Again, it was good
to have someone to run with so we stuck together for quite
a while.&lt;/p&gt;
&lt;h2 id=&quot;juliette-%5B4.58-mi%2C-%2B64%2F-3024-ft%2C-51%3A31%5D&quot;&gt;Juliette [4.58 mi, +64/-3024 ft, 51:31] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#juliette-%5B4.58-mi%2C-%2B64%2F-3024-ft%2C-51%3A31%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As they say, it&#39;s all downhill from here, and the section from
Ramshorn to Juliette is quite runnable double track and fire
road, so it was mostly a matter of just cruising through it
while remembering that we still had a lot of climbing to go. Not
too much to say about this section; I stayed about 20 minutes
ahead of pace.&lt;/p&gt;
&lt;h2 id=&quot;bayhorse-lake-%5B8.36-mi%2C-%2B3081%2F-1395-ft%2C-2%3A13%3A39%5D&quot;&gt;Bayhorse Lake [8.36 mi, +3081/-1395 ft, 2:13:39] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#bayhorse-lake-%5B8.36-mi%2C-%2B3081%2F-1395-ft%2C-2%3A13%3A39%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the pre-race meeting, we were told that the climb out of Juliette
had a lot of creek crossings, and it didn&#39;t disappoint. In any
case, this is where things started to go sideways, I think due
to a combination of factors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Fatigue&lt;/li&gt;
&lt;li&gt;Difficult footing and creek crossings making it hard to find my rhythm&lt;/li&gt;
&lt;li&gt;The altitude starting to get to me (Juliette is already at ~7000ft)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I left the AS with Lara and felt like I was moving faster, but in
reality was just kind of yoyoing, and eventually we mostly just
settled in together.&lt;/p&gt;
&lt;p&gt;Psychologically this was a really hard section because I wasn&#39;t
feeling great and there was still a really big climb to go out
of Buster Lake. Worse yet, this stretch actually has two summits,
with the first one followed by a mile plus stretch of rolling terrain
and then a mile long descent and then another climb. This is all
kind of hard to see on the Garmin watch, so I incorrectly thought that
part of the rolling section was the second summit (wishful thinking)
and it was pretty demoralizing to realize there was another big
climb to go.&lt;/p&gt;
&lt;p&gt;By the time I got to the Bayhorse Lake AS (not to be confused with Bayhorse)
I had given up basically
all of the time I gained in the first half of the race and was
right at the ultraPacer target for 16:30. Unsurprisingly, things
didn&#39;t get much better from here.&lt;/p&gt;
&lt;p&gt;I had a drop bag at Bayhorse Lake and after all those creek crossings
I decided it was time to change my socks, so I spent quite a while
here wiping down my feet, swapping out all my nutrition, and watching
the AS people try to get the Maurten in my bottles to dissolve
(more on this later). While I was messing around, Lara picked
up her pacer and left, but I figured it was more important to have
my stuff in good order than to have company, and I knew I had my
own pacer at the next AS.&lt;/p&gt;
&lt;h2 id=&quot;squaw-creek-%5B7.57-mi%2C-%2B847%2F-3023-ft%2C-1%3A43%3A14%5D&quot;&gt;Squaw Creek [7.57 mi, +847/-3023 ft, 1:43:14] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#squaw-creek-%5B7.57-mi%2C-%2B847%2F-3023-ft%2C-1%3A43%3A14%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There&#39;s some climbing out of Bayhorse, followed by a really long
downhill. The downhill starts fairly technical with a bunch of rocks
and talus and then turns into easy fire road. By this time I had
caught up with Lara and her pacer, who had done RONR before and
advised me that it wasn&#39;t worth trying to run the technical bit,
because you wouldn&#39;t go that much faster and were just courting
a fall. This was welcome news as I was feeling pretty tired.&lt;/p&gt;
&lt;p&gt;Soon enough we hit the easier fire road section and from here it was
just a long cruise down to the AS. You can actually see the transition
quite clearly on the pace chart around mile 43 as I go from losing
time to slightly making it up. The reason for this isn&#39;t that I was
somehow a lot worse on the technical bits but that ultraPacer doesn&#39;t
really know what kind of footing there is (you can tell it but I
didn&#39;t) but instead is modeled on grade, so it overestimated pace on
the technical sections and underestimated pace the easy sections.
Somewhere in here I caught up and passed Kat, who had had a good
middle section but was now dragging badly and eventually DNFed.&lt;/p&gt;
&lt;p&gt;This section felt pretty long but was manageable, in part because
I knew I would have company for the rest of the race once I hit
Squaw Creek. At this point I was 20 minutes behind target.&lt;/p&gt;
&lt;h2 id=&quot;buster-lake-%5B7.57-mi%2C-%2B2854%2F-738-ft%2C-2%3A04%3A05%5D&quot;&gt;Buster Lake [7.57 mi, +2854/-738 ft, 2:04:05] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#buster-lake-%5B7.57-mi%2C-%2B2854%2F-738-ft%2C-2%3A04%3A05%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Squaw Creek isn&#39;t an official drop bag, but as my pacer Kate was there
(she had been working the AS), she had brought food resupply, and I took
a bit longer at the AS then I really wanted to. In the meantime,
Lara and her pacer took off and I never saw them again (she eventually
finished almost 30 minutes ahead of me).&lt;/p&gt;
&lt;p&gt;The leg from Squaw Creek to Buster Lake is the last big climb and
this was the hardest part of the race for me. Almost immediately I
started to feel really tired and out of breath, and it just got
worse as I gained altitude. Moreover, I was starting to feel
really nauseated and dizzy. In the first few miles I actually had
to stop a few times and just rest for 30 seconds or so. We were
sort of going back and forth with a few other guys and after I&#39;d
passed them, I said I wanted to rest and Kate really saved me by
asking &amp;quot;do you really need to or can you just slow down a bit?&amp;quot;
That was the right question and the answer was of course &amp;quot;keep
going, just slowly&amp;quot;.&lt;/p&gt;
&lt;p&gt;This section also had a lot of water crossings, though not as many
as the previous sections, and some of it was really muddy. Partway
through I just slipped and landed more or less face down in the
mud. Nothing was injured but I got super dirty and just had to
finish the race that way.&lt;/p&gt;
&lt;p&gt;It was really a relief to hit Buster Lake, as it meant the end
of the climbing and now I just had to survive the giant downhill.
My last drop bag was here, so I swapped out my food again,
with the intention that to eat gels from here on in, grabbed
my headlamp, and ditched my poles (not going to need them on
the downhill) and headed out.
My stomach was still feeling pretty bad, so  I grabbed some quesadillas in the
hope they would settle things down—they
go really well with dirt—and decided to hike a bit while
I got them down.&lt;/p&gt;
&lt;p&gt;At this point, I was 48 minutes behind target, but with 13 miles of
downhill ahead of me.&lt;/p&gt;
&lt;h2 id=&quot;custer-motorway-%5B8.35-mi%2C-%2B290%2F-2841ft%2C-1%3A34%3A59%5D&quot;&gt;Custer Motorway [8.35 mi, +290/-2841ft, 1:34:59] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#custer-motorway-%5B8.35-mi%2C-%2B290%2F-2841ft%2C-1%3A34%3A59%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Like the descent out of Bayhorse Lake, this stretch starts
out as somewhat technical rocky trail and then turns into
fire road. As before, I opted to sort of hike the technical part
and then run the fire road. I&#39;d heard that this section was
pretty easy, but the technical section seemed to go on forever—I
was of course really tired, but even Kate said so—and
even when I hit the fire road part, running didn&#39;t
feel great and I found myself hiking some of the really
not-steep uphill sections.&lt;/p&gt;
&lt;p&gt;Finally, we got to the last AS at Custer Motorway. My stomach
still didn&#39;t feel great at this point but they didn&#39;t have
any quesadillas ready and I sure wasn&#39;t waiting, so spent
almost no time here.&lt;/p&gt;
&lt;h2 id=&quot;finish-%5B4.68-mi%2C-%2B36%2F-878-ft%2C-50%3A09%5D&quot;&gt;Finish [4.68 mi, +36/-878 ft, 50:09] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#finish-%5B4.68-mi%2C-%2B36%2F-878-ft%2C-50%3A09%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;From Custer to the finish is all paved road (it starts
a little before the AS). I had been sort of going back and
forth with a few other guys on the fire road section, but as soon as I
hit the paved road I felt like I could really run again and Kate and I
completely dropped them (eventually finishing almost 9 minutes ahead).&lt;/p&gt;
&lt;p&gt;This section is entirely runnable and it&#39;s merely a matter of putting
your head down and gutting it out. Kate and I had been talking all the
way through here, but for the rest of the race I didn&#39;t want to talk
but just needed to focus on keeping the pace up. This was the hardest
I&#39;ve ever pushed at the end of an ultra and it was super helpful just
to have someone next to you keeping a steady pace when everything
hurts. This section felt like it took forever and towards the end we
were just counting down the tenths of miles to the finish, but we did
the last 5 and change miles at 9:43, 9:42, 9:31, 9:05, 9:03, and 8:56
pace, going from over an hour behind target to just over 50 minutes
behind.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ronr-pace-compare.png&quot; alt=&quot;RONR Pace Comparison&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Comparison of ultraPacer target to actual race. Source: ultraPacer
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Overall, this feels like a solid result, though probably not as strong
as Sean O&#39;Brien. I went in not really knowing what to expect and
so my pace targets were pretty handwavy. I would have been unhappy
with 18 and quite happy with &amp;lt;17, so 17:18 seems reasonable.&lt;/p&gt;
&lt;p&gt;My nutrition worked reasonably well, with two real things I&#39;d like to deal
with:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;As before, I felt like the bars at the start didn&#39;t go down that well.&lt;/li&gt;
&lt;li&gt;I was really having problems getting the Maurten drink to mix, even when I
had volunteers shaking it for me. This is a known issue with Maurten,
but it slows you down and it&#39;s also pretty gross when you get a bunch
of wet powder in your mouth instead of gel.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first item is easy: just switch to gels the whole way. I&#39;m less sure
what to do for the drink mix, as Maurten really just goes down a lot easier.
I used Tailwind on a recent outing in the Sierras and after 6 hours or
so I&#39;d just had enough of how sweet it was. Maybe it&#39;s time to try
Never Second.&lt;/p&gt;
&lt;p&gt;I think the biggest limiting factor here was the altitude. It&#39;s always
a challenge to go from sea level to 7000+ feet and while I was mostly OK
at the start of the race, I could really feel myself dragging later
whenever I got above 8000 or so feet. I suspect that this also contributed
to my stomach issues, as nausea is a common altitude sickness symptom.&lt;/p&gt;
&lt;p&gt;This probably isn&#39;t the best I could possibly have done with this training base but I don&#39;t
think it was that far off. I lost a lot of time on the last climb—as
you can see by Lara putting 30 minutes on me—and think it&#39;s possible I could have pushed
it harder, but I doubt I could have gone that much faster.
I think to really turn in a better performance I would have had to spend a few weeks at
altitude so that the higher elevations didn&#39;t hit me so hard.
I&#39;m quite pleased
with how much I managed to push the last hour or so. That&#39;s something
I&#39;ll want to remember how to do in future races.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ronr-finish.jpeg&quot; alt=&quot;At the finish&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Chilling at the finish. Still muddy.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;overall&quot;&gt;Overall &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ronr-report/#overall&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;17:18:39, 24th/(66 finishers, 92 starters), 17th/51 male, 2nd 50-59&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
That name sounds pretty ominous but turns out to &lt;a href=&quot;https://ronrenduranceruns.com/courses/100k/&quot;&gt;refer to&lt;/a&gt; the
1800s when miners would carry supplies on boats down the Salmon
river but not be able to get back up the river. In any case, I returned. &lt;a href=&quot;https://educatedguesswork.org/posts/ronr-report/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As you can see, I also have some Spring energy gels. The flavor
is a nice break from Maurten, but in light of the recent
&lt;a href=&quot;https://www.irunfar.com/spring-energy-awesome-sauce-gel-controversy-lab-results&quot;&gt;measurements of Spring&#39;s calorie counts coming in way lower than
claimed&lt;/a&gt;,
I don&#39;t want to rely on it. I had some floating around though,
so figured I might bring it just in case I really lost
the ability to tolerate Maurten. &lt;a href=&quot;https://educatedguesswork.org/posts/ronr-report/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>New EV Habits for ICE Vehicle Owners</title>
		<link href="https://educatedguesswork.org/posts/ev-for-ice/"/>
		<updated>2024-06-03T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ev-for-ice/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/charging-man.jpeg&quot; alt=&quot;Man waiting to charge&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Generated by Midjourney. Prompt &amp;quot;Man waiting for EV to charge, bored expression, EV charging station, photorealistic --ar 4:3&amp;quot;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I spent some time reading this &lt;a href=&quot;https://news.ycombinator.com/item?id=40489905&quot;&gt;HN thread&lt;/a&gt;
in response to Wired&#39;s &lt;a href=&quot;https://www.wired.com/story/how-many-charging-stations-would-we-need-to-totally-replace-gas-stations/&quot;&gt;article&lt;/a&gt;
on how many EV charging stations we need and I&#39;m dumber than when I started
(isn&#39;t that usually the way it is on the orange site?). On one side, we have the &lt;em&gt;Internal Combustion Engine (ICE)&lt;/em&gt; forever crowd
enders worried about the tragedy of wasting 30 minutes charging on their 500 mile
road trip and on the other side we have EV lovers acting as if there&#39;s
really no tradeoff.&lt;/p&gt;
&lt;p&gt;I have two EVs so there&#39;s no doubt about what side of the argument I&#39;m
on, but I&#39;m also not going to tell you that it&#39;s not inconvenient at
times. The truth is that EVs really are a lot more convenient for most
people for day to day driving but less convenient for long road trips,
especially if you treat them the way you would an ICE vehicle rather
than adapting yourself to their idiosyncrasies.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;background-facts&quot;&gt;Background Facts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#background-facts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The two basic vehicle parameters that dominate any discussion of EVs versus
ICE vehicles are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;range&lt;/strong&gt;: how long you can drive without refueling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;refueling speed&lt;/strong&gt; how long it takes to refuel&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;range&quot;&gt;Range &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#range&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;When EVs were first introduced, range was fairly bad,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
but things have
gotten a lot better.  Edmunds
&lt;a href=&quot;https://www.edmunds.com/most-popular-cars/&quot;&gt;lists&lt;/a&gt; the Toyota RAV4 as
the most popular non-truck ICE vehicle,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; and the Tesla Model Y as the top EV. Both of these are
compactish SUVs, so pretty comparable.  The RAV4 Hybrid gets about 38
mpg highway with a 14.5 gallon tank, so has a range of about 550 miles
(this is a little hard to estimate because it&#39;s a hybrid). The most
popular EV, the Tesla Model Y, has a listed range of 320 miles.&lt;/p&gt;
&lt;p&gt;This is a real physics problem for EVs because the energy
density of batteries is much worse than for gasoline cars: the RAV4&#39;s
gas weighs about 120 lbs; the Tesla&#39;s battery weighs 1700lbs. What
this means in practice is that adding range to an EV involves tradeoffs
in terms of cost and weight but
it&#39;s trivial to add range to an ICE vehicle just by making
the tank a bit bigger. If Toyota
has chosen 14.5 gallons, that&#39;s because they don&#39;t think you need
more.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;power-units-versus-energy-units&quot;&gt;Power Units versus Energy Units &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#power-units-versus-energy-units&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The terminology around EV units can be a bit confusing.  A battery
stores a certain amount of energy, which is conventionally measured in
kilowatt hours (kWh), which is to say the amount of energy you would
put into the battery if you added it at the rate of one kilowatt (kW)
for an hour. What&#39;s a kilowatt, then? It&#39;s 1000 watts, where a watt is
the power needed to transfer one joule (the SI unit of energy) per
second.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
In other words, a kilowatt hour is 3.6 million joules (3.6 megajoules (MJ)).
Electricity tends to get sold in units of kWh, which is probably why
batteries are rated this way rather than in MJ.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;EV &lt;a href=&quot;https://www.evspecs.org/comparison-chart/consumption&quot;&gt;efficiency&lt;/a&gt;
varies dramatically, but a reasonable estimate is around 3-4 mi/kWh
(5-7 km/kWh). Battery &lt;a href=&quot;https://www.evspecs.org/comparison-chart/battery-capacity-usable-kwh&quot;&gt;size&lt;/a&gt;
also varies quite dramatically, but the median is around 75kWh.
Multiplying these two values you get a range of 225-300 mi, which
is about what you should expect from the above.&lt;/p&gt;
&lt;h3 id=&quot;refueling-speed&quot;&gt;Refueling Speed &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#refueling-speed&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;ICE vehicles charge faster than EVs. Period. The HN thread had some
crazy fast estimates, but gasoline pumps do about &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Gasoline_pump&amp;amp;oldid=1197588681&quot;&gt;50l/13 gallons per minute&lt;/a&gt;, so we&#39;re looking at on the order of 5 minutes to
fill up your tank. No deployed EV battery charges even remotely this fast.&lt;/p&gt;
&lt;p&gt;At a high level, there are &lt;a href=&quot;https://afdc.energy.gov/fuels/electricity-stations&quot;&gt;three main types of charger&lt;/a&gt; in the US:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;AC level 1:&lt;/dt&gt;
&lt;dd&gt;Plugs into an ordinary 110V socket. About 1-2kW.&lt;/dd&gt;
&lt;dt&gt;AC level 2:&lt;/dt&gt;
&lt;dd&gt;Requires a dedicated circuit but installable in your home. Typically around 7kW.&lt;/dd&gt;
&lt;dt&gt;DC Fast Charging (&amp;quot;Level 3&amp;quot;):&lt;/dt&gt;
&lt;dd&gt;Commercial charging stations. Typically between 30 and 350 kW. My experience is that
there is a lot of variation in actual charging speed for fast chargers, both
in terms of rated power and in terms of actual power delivery. In addition,
not all cars will charge at the maximum speed of the charger, with newer
cars doing better.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Of course, what really matters isn&#39;t the rate of power delivery but rather
the rate of range added. If we assume 3.5 mi/kWh, we get something
like:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Charger type&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Charging power&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;miles added/hr&lt;/th&gt;
&lt;th&gt;Time to add 250 miles of range&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;L1&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1.5&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5.25&lt;/td&gt;
&lt;td&gt;47 hrs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;L2&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;24.5&lt;/td&gt;
&lt;td&gt;10 hrs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;L3 (normal)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;50&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;175&lt;/td&gt;
&lt;td&gt;85 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;L3 (fast)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;150&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;525&lt;/td&gt;
&lt;td&gt;29 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Tesla Supercharger (rated)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;250&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;875&lt;/td&gt;
&lt;td&gt;17 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;L3 (ultrafast)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;350&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1225&lt;/td&gt;
&lt;td&gt;12 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As I mentioned above, real world experience varies. As a reference
point, I have a BMW i3 and a Kia EV6. The BMW will nominally accept
up to 49kW, but I don&#39;t think I&#39;ve ever seen above 40. The Kia
will nominally charge at up to 233 kW, but I think the highest
I have ever seen is around 180 kW. It&#39;s also important to know that charging
slows down quite a bit once the battery hits 80%, so as a practical matter
it takes a lot longer to get to the full nominal range of the car than
it does to get to 80% range. Again, this isn&#39;t an issue with gas cars
where filling rate is comparatively constant.&lt;/p&gt;
&lt;h2 id=&quot;day-to-day-driving&quot;&gt;Day to Day Driving &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#day-to-day-driving&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The day-to-day driving experience for an EV is totally different
from an ICE vehicle. With an ICE vehicle, you just drive around until
you are low on gas and then visit the filling station.
If you have an EV and a home charger—which you really want
to have&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;—you
basically never have to use a public charger on a daily basis,
even if all you have at home is an L1 charger. all you
do is plug your car in when you get home, which quickly
becomes a habit. If you have a home L2 charger, you don&#39;t even
need to do it every day.&lt;/p&gt;
&lt;p&gt;The average US &lt;a href=&quot;https://www.axios.com/2024/03/24/average-commute-distance-us-map&quot;&gt;commute
distance&lt;/a&gt;
is 42 miles, which represents about 8 hrs on an L1 charger.  This
means that if you just drive to work and back and you&#39;re at home for
12 hrs a day, you&#39;ll always have a full battery when you leave in the
morning, with about 4 hrs to spare.
As long as you don&#39;t
drive more than 60ish miles, you&#39;ll still have a full battery every
morning. This means that you almost never have situations where you get up,
are late for something, and realize you need to stop and get gas,
as happens with ICE vehicles.&lt;/p&gt;
&lt;p&gt;Obviously people don&#39;t just commute and if you take a longer drive
then you&#39;ll use up more of your battery. However, on a day to day
basis, most people don&#39;t drive more than the range of their car.
If you drive more than overnight charge&#39;s worth in one day, then you&#39;ll just
have a slightly less than full battery, but the &lt;em&gt;net&lt;/em&gt; amount of
drain is just however many miles you drove minus the amount you
can charge overnight, so unless you have a lot of days with long
trips, your battery never gets too low, and when you return to a normal
pattern, it will refill again, unless you routinely drive as many
miles as your charger can support.&lt;/p&gt;
&lt;p&gt;For example, consider someone who has an EV with a range of 200 miles
and drives 40 miles a day regularly, then has a few days where they
need to drive 80. Here&#39;s what their battery state looks like after the
overnight charge:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Day&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Morning Range&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Miles Driven&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Evening Range&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;200&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;80&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;180&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;80&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;160&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;80&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;140&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;160&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;6&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;180&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;140&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;200&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;160&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The bigger your battery, the longer you can sustain periods
when you&#39;re consuming more than you&#39;re charging (this is of
course also the situation when you&#39;re driving the car).
Consider a vehicle with a 100 mile battery driven the same way:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Day&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Morning Range&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Miles Driven&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Evening Range&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;120&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;80&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;80&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;80&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;60&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;80&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;strong&gt;-20&lt;/strong&gt; (oops)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;On day three, instead of being down to less than half the battery,
you&#39;re actually at negative battery instead! The only difference here
is that you don&#39;t have as big a buffer, so that when you consume more
than you charge you run out. The bigger the battery, the more buffer you
have and therefore the less of a big deal it is if you do a long drive
one day. This buffer is built up in the days before your long drive
when you&#39;re charging more than the drain; with a small battery,
the car is just sitting fully charged whereas with a big battery
it would still be charging.&lt;/p&gt;
&lt;p&gt;Of course, all of this is just with an L1 charger. If you have an L2
charger at home, then 12 hrs of charge is around 300 miles of range
and so you&#39;ll nearly always have a full battery in the morning
and will essentially never have to visit a public charger.
You really just have to worry about situations where you do enough
driving in one day to completely deplete your battery. This brings
us to the topic of road trips.&lt;/p&gt;
&lt;h2 id=&quot;road-trips&quot;&gt;Road Trips &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#road-trips&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s clearly more convenient to not have to worry about refueling on a
day-to-day basis, once you want to drive more than the range of your
vehicle in one day, the situation gets quite a bit worse.
In an ICE vehicle you can just generally drive from point A to point
B and when you get low on gas, pull out your phone and look for
a gas station. This is not a good plan for an EV for several reasons.&lt;/p&gt;
&lt;p&gt;First, there are a lot fewer EV chargers than there are gas stations.
As of Jan 2024, California had &lt;a href=&quot;https://www.bloomberg.com/news/articles/2024-01-31/the-us-installed-more-than-1-000-ev-charging-stations-since-summer&quot;&gt;less than
2000&lt;/a&gt;
DC fast charging stations (there are around 7000 total in the US). By comparison
there are over &lt;a href=&quot;https://www.xmap.ai/blog/a-comprehensive-guide-to-californias-gas-station-data-in-2024&quot;&gt;13000&lt;/a&gt;
gas stations in California.
Moreover, because of charging network incompatibility, you
can&#39;t use every charger (though Tesla is supposed to be opening up its
network to non-Tesla cars, which will improve the situation for
non-Tesla owners, as Tesla operates the biggest network).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
The result
of this is that when you get down to (say) 30 miles of range, you
may not be able to find a conveniently located fast charger.
And because the range of EVs is somewhat lower you will need
to find a charger more often.&lt;/p&gt;
&lt;p&gt;Second, as should be clear from above, EV charging is significantly
slower than filling your gas tank even in the best case scenario
where you have a fast DC charger (say 10-20 minutes). If you can only
find a normal L3, you&#39;re looking at closer to an hour. I wouldn&#39;t
generally even bother with stopping at a L2 charger, though they can
be useful for charging overnight at a hotel or something. You can,
of course, sit in your car at the charger for 30-60 minutes but
it&#39;s not an ideal experience. Worse yet, it&#39;s not uncommon for
the chargers to be full and/or one of the ports to be broken,
in which case you also need to wait for someone else to finish.&lt;/p&gt;
&lt;h3 id=&quot;basic-strategy&quot;&gt;Basic Strategy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#basic-strategy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;My recommendation instead is to lean into the way an EV behaves rather
than trying to treat it like an ICE vehicle. What this mostly
means is to plan your trip around actually stopping to charge.&lt;/p&gt;
&lt;p&gt;As a real example, consider a trip from Palo Alto to Los Angeles
in my Kia EV 6 GT (range: 210 miles). The total trip is 360 miles,
so I should be able to do it with one charging stop, as long as
it&#39;s located more or less halfway through. There&#39;s really only one
choice here, which is Kettleman City, located 184 miles from Palo Alto and
178 from Los Angeles, where there is a 10 port &lt;a href=&quot;https://www.plugshare.com/location/362204&quot;&gt;Electrify America charging station&lt;/a&gt;.
The station itself is located at Chalios Mexican
Restaurant, but it&#39;s in a complex with a pile of other
fast food restaurants (In-n-Out, Baja Fresh, McDonalds, etc.).
To be honest, this is actually on the good side in terms of
location options; lots of Electrify America stations are in
Walmart parking lots. Anyway, what you want to do here is plan to get there around lunchtime,
plug your car in, and then go grab some food while it charges.
If there&#39;s a spare port when you arrive, it&#39;s actually reasonably
likely that charging will be done before you finish eating
(be nice, move your car), but even if not, you can just chill
in In-n-Out for a bit.&lt;/p&gt;
&lt;p&gt;This is basically the only good option if you have an EV with a 200-odd
mile range and you want to make one stop: the next closest choices
are Coalinga (203 miles from LA) and (214 miles from Palo Alto).
you might make it with one of these, but you&#39;re cutting it a lot closer
than I like. By contrast, if you have an EV with a 300 mile range, you
could pick either of these, or even make it down to Bakersfield
before finding a charging station. Of course, if you had a 400 mile range
(e.g., Tesla Model 3 or S long range, Rivian R1, etc.) then you can
actually do the whole trip in one shot, though you&#39;d need to charge
when you got there.&lt;/p&gt;
&lt;h3 id=&quot;trip-planning&quot;&gt;Trip Planning &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#trip-planning&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;For the best result, an EV trip requires a lot more planning than with
an ICE vehicle. I&#39;ve certainly done trips where I just drove for a while
and then searched for a charger, but this definitely has a higher risk of
charging in a Walmart parking lot. You&#39;re going to be happier if you
do some advance research. There are a number of trip planning tools
available to you (&lt;a href=&quot;https://www.tesla.com/trips&quot;&gt;Tesla&lt;/a&gt;, &lt;a href=&quot;https://www.plugshare.com/&quot;&gt;PlugShare&lt;/a&gt;,
&lt;a href=&quot;https://abetterrouteplanner.com/&quot;&gt;A Better Route Planner&lt;/a&gt;).
There&#39;s no magic here, just put in your source and destination and play around
a bit. Some of the tools will actually recommend specific stops
and some you have to do it manually, but in either case you end up
with an itinerary telling you where to stop.&lt;/p&gt;
&lt;h3 id=&quot;less-good-cases&quot;&gt;Less good cases &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#less-good-cases&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The Palo Alto to Los Angeles trip is basically the best case scenario:
California has a lot of EV chargers and you can take Interstate 5
pretty much the whole way, so you&#39;re never that far from something.
Even so, I tried a few experimental but realistic trips
(Palo Alto to Yosemite, Denver to &lt;a href=&quot;https://hardrock100.com/&quot;&gt;Silverton&lt;/a&gt;, Los Angeles to &lt;a href=&quot;https://www.aravaiparunning.com/tushars/&quot;&gt;Beaver UT&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;), and was usually able to
find some kind of route. There&#39;s even an Electrify America charger
at the Days Inn in Beaver, so you&#39;re not stuck at 10% when you
arrive.
With that said, you could easily spend a lot of time in
gas station and Walmart parking lots.&lt;/p&gt;
&lt;p&gt;Probably the worst case is when you are headed to somewhere remote and
there may not be a charger, so you need to plan for a round trip.
For instance, Lone Pine California doesn&#39;t really have anything in
the way of non-Tesla chargers, and there are only two stations on
the way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A Chargepoint L2 that might have one L3 port in Beatty&lt;/li&gt;
&lt;li&gt;A pair of non-networked L2 plugs in Stovepipe Wells&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Honestly, this would all leave me feeling pretty antsy and I&#39;m
not sure I&#39;d want to do that trip in an EV that didn&#39;t have a
really long range. You don&#39;t want to be stuck out in the middle
of Death Valley with a dead battery.&lt;/p&gt;
&lt;h2 id=&quot;summing-up-and-the-future&quot;&gt;Summing Up and the Future &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#summing-up-and-the-future&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The bottom line is that neither an EV nor an ICE vehicle is overall
better in terms of convenience. For day-to-day driving, just having a
car which basically never needs to be fueled is clearly a win, so as
long as you have a charger at home, it&#39;s hard to go wrong with an
EV. You just have to remember to charge it every night.&lt;/p&gt;
&lt;p&gt;When it comes to road trips, an ICE vehicle is more convenient,
but you can close a lot of the gap with some good planning in terms
of when you stop and charge. If you try to drive an EV the way you
would an ICE vehicle by just driving until you are low on charge
and then looking for a charger, you&#39;re going to have a much worse
experience.&lt;/p&gt;
&lt;p&gt;The good news is that the EV charging situation is getting rapidly
better on all three fronts: (1) Batteries are getting bigger so you
need to charge less frequently; (2) charging is getting faster so
it&#39;s less of a hassle; and (3) more stations are being built so
you have more options in terms of where to charge. As of today
I&#39;d feel comfortable doing most road trips on the West Coast in
an EV, but there are still a few for which I&#39;d want to rent something
else, which seems like a reasonable tradeoff for the other ways
in which an EV is better. If you buy an EV in five years or so, I expect there will
be very few trips you won&#39;t be able to do in it.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;m talking here just about charging, but obviously there
are a lot of ways in which EVs are just plain better, starting
with dramatically better driving performance. I&#39;m not here
to sell you that, though. &lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For example, the original BMW i3 had a range of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=BMW_i3&amp;amp;action=info&quot;&gt;less than 100 miles&lt;/a&gt;
in 2014. &lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt; The top 4 vehicles are all
trucks. &lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As a reference point, a reasonably fit person can put out
around 300 W on a bike for an extended period of time. &lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that this isn&#39;t some scenario where we&#39;re using goofy
non-metric units. kWh are still defined in a sensible way
from the base units, they&#39;re just not the SI official
way of doing things. Calories (the amount of energy to
heat a gram of water by 1&lt;sup&gt;o&lt;/sup&gt;C) are in a similar
position of being a metric but not SI unit that is widely
used in specific contexts. &lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Exception: people who can charge at work &lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For non-Tesla owners, this mostly means you want Electrify America,
which operates a lot of 150 kW and 350 kW DC chargers. The
bad news is that it&#39;s not at all uncommon for EV chargers to
be broken. &lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Ultrarunners may be sensing a theme here &lt;a href=&quot;https://educatedguesswork.org/posts/ev-for-ice/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Notes on Post-Quantum Cryptography for TLS 1.2</title>
		<link href="https://educatedguesswork.org/posts/pq-tls12/"/>
		<updated>2024-05-24T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/pq-tls12/</id>
		<content type="html">&lt;p&gt;As mentioned in &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout&quot;&gt;previous&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency&quot;&gt;posts&lt;/a&gt;,
the IETF has decided not to add support for post-quantum (PQ) encryption algorithms
to TLS 1.2. In fact, the TLS WG is taking a rather stronger position, namely
that it&#39;s going to &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-rsalz-tls-tls12-frozen/&quot;&gt;stop enhancing TLS 1.2 more or less entirely&lt;/a&gt;, including support for PQ algorithms:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;While the industry is waiting for NIST to finish standardization, the IETF has several efforts underway. A working group was formed in early 2013 to work on use of PQC in IETF protocols, [PQUIPWG]. Several other working groups, including TLS [TLSWG], are working on drafts to support hybrid algorithms and identifiers, for use during a transition from classic to a post-quantum world.&lt;/p&gt;
&lt;p&gt;For TLS it is important to note that the focus of these efforts is TLS 1.3 or later. TLS 1.2 is WILL NOT be supported (see Section 5).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As I wrote previously, to some extent this is a political position:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One challenge with the story I told above is that PQ support is only
available in TLS 1.3, not TLS 1.2. This means that anyone who wants
to add PQ support will &lt;em&gt;also&lt;/em&gt; have to upgrade to TLS 1.3. On the
one hand, people will obviously have to upgrade anyway to add the PQ algorithms,
so what&#39;s the big deal. On the other hand, upgrading more stuff
is always harder than upgrading less. After all, the TLS
working group &lt;em&gt;could&lt;/em&gt; define new PQ cipher suites for TLS 1.2,
and it&#39;s an emergency so why not just let use people use TLS 1.2 with PQ
rather than trying to force people to move to TLS 1.3.
On the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=The_Mote_in_God%27s_Eye&amp;amp;action=info&quot;&gt;gripping hand&lt;/a&gt;,
TLS 1.3 is very nearly a drop-in replacement
for TLS 1.2. There is one TLS 1.2 use case that it TLS 1.3
didn&#39;t cover (by design), namely the ability to passively decrypt
connections if you have the server&#39;s private key (sometimes called
&amp;quot;&lt;a href=&quot;https://www.nccoe.nist.gov/addressing-visibility-challenges-tls-13&quot;&gt;visibility&lt;/a&gt;&amp;quot;), which is used for
server side monitoring in some networks. However, this technique won&#39;t work
with PQ key establishment either, so it&#39;s not a regression if you convert
to TLS 1.3.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this post, I want to look at what it would actually take to add PQ
support to TLS 1.2 and why we probably shouldn&#39;t do it (as well as &amp;quot;revise and
extend&amp;quot; that last point). This requires going into
some more detail about the cryptographic primitives we are working with
here as well as the history of TLS key establishment.&lt;/p&gt;
&lt;h2 id=&quot;static-rsa&quot;&gt;Static RSA &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#static-rsa&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;SSLv3 (and later TLS) originally supported two main key establishment modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Static RSA&lt;/li&gt;
&lt;li&gt;Diffie-Hellman&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a long time by far the most common mode was &lt;em&gt;static RSA&lt;/em&gt;, shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-static-rsa.png&quot; alt=&quot;TLS static RSA&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS 1.2 static RSA mode
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The way that this mode worked was that the server&#39;s certificate contained
a public key for the &lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/TODO&quot;&gt;RSA algorithm&lt;/a&gt;. The client then generated
a random value (the &lt;em&gt;premaster secret (PMS)&lt;/em&gt;) which it encrypted under
the RSA public key. The server used its private key to decrypt the
PMS, at which point both client and server knew it. They would
each derive traffic keys from the PMS (as well as some
other components of the handshake) which could be used to
protect the traffic. Because the attacker doesn&#39;t have the private
key it is unable to recover the PMS and therefore will not be able
to communicate with the client.&lt;/p&gt;
&lt;p&gt;This design has the property that if you know the RSA private key
you can decrypt any connection protected with it. This means
that an attacker who is able to obtain the private key, for
instance by compromising the server, will be able to decrypt
any connection that they have recorded, including connections
months or years in the past (note that this is the same
kind of attack we are worried about with a CRQC, except that
a CRQC could recover the key from the handshake without compromising
the server).&lt;/p&gt;
&lt;h2 id=&quot;ephemeral-diffie-hellman&quot;&gt;Ephemeral Diffie-Hellman &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#ephemeral-diffie-hellman&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;SSLv3 also included a mode based on Diffie-Hellman key exchange:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-dhe.png&quot; alt=&quot;TLS DHE mode&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS 1.2 ephemeral DH mode
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In this mode, the server generates a Diffie-Hellman key share
(public/private key pair) and sends it to the client.  In order to
authenticate the share, it &lt;em&gt;signs&lt;/em&gt; the share using the RSA key, thus
proving that the server controls the private key. An attacker who
doesn&#39;t have the private key will not be able to sign the key
share and therefore cannot impersonate the server.&lt;/p&gt;
&lt;p&gt;As long as the client and server generate a fresh key share for
each connection—which isn&#39;t strictly required by the
specification, but is common practice—and then delete
the private part of the key share after use, then even if
an attacker subsequently compromises the server&#39;s private signing key,
it still won&#39;t be able to decrypt connections that happened
in the past. This property is called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Forward_secrecy&amp;amp;oldid=1219004112&quot;&gt;forward secrecy&lt;/a&gt; (sometimes &amp;quot;perfect forward secrecy&amp;quot;).&lt;/p&gt;
&lt;p&gt;In the early days of SSL/TLS deployment, there was a lot of
concern about the performance cost of the cryptography and
ephemeral DH mode is much more expensive than RSA key
exchange (DH itself is expensive and you also have to
do the RSA signature), so most servers did static RSA in
order to save CPU.
Over time, however, a number of factors combined to make
forward secret key establishment more attractive:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;New Diffie-Hellman variants based on &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Elliptic-curve_cryptography&amp;amp;oldid=1211841540&quot;&gt;elliptic curves&lt;/a&gt; were developed.
&lt;em&gt;Elliptic Curve Diffie Hellman Ephemeral (ECDHE)&lt;/em&gt; algorithms were much
faster than the older finite-field based algorithms and
so the marginal cost of doing ECDH was much less important.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Servers got faster so that the cryptography wasn&#39;t as big
a deal overall.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There was increasing concern about the practical security of
non-forward secret algorithms, in part due to the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=2010s_global_surveillance_disclosures&amp;amp;oldid=1220497443&quot;&gt;Snowden revelations&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because of the design of TLS, it was possible to incrementally deploy
ECDHE. As described in &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout&quot;&gt;a previous post&lt;/a&gt;, TLS negotiates
the key establishment algorithm and many clients already supported
ECDHE key establishment, so as soon as the server turned on ECHDE,
it would automatically be able to use it with compatible
clients. Moreover, RSA has the interesting property that
you can use the same key pair for both encryption/decryption and
digital signature, so the server could use its existing RSA
certificate to authenticate to the client; all it had to do is
enable ECDHE.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Starting in 2013, TLS deployments increasingly used ECDHE
for key establishment, as shown in the graph below.&lt;/p&gt;
&lt;figure id=&quot;key-exchange-modes-over-time&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-key-exchange-longitudinal.png&quot; alt=&quot;TLS key exchange modes&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;TLS 1.2 key exchange modes over time. From &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3278532.3278568&quot;&gt;Kotzias et al, 2018&lt;/a&gt;.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Because ECDHE is so much faster, it didn&#39;t make much
of a difference in terms of cost to the server to do so;
in fact if you &lt;em&gt;also&lt;/em&gt; enabled EC-based signatures using
ECDSA, the total cost to the server was actually less
than using RSA, though as a practical matter many servers
still use RSA certificates (which should also suggest to
you that the performance issues are less of a factor now
then when SSLv3 was first designed).&lt;/p&gt;
&lt;h2 id=&quot;tls-1.3&quot;&gt;TLS 1.3 &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#tls-1.3&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;When TLS 1.3 was designed starting in 2013, we had a number
of objectives:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Clean up:&lt;/strong&gt; Remove unused or unsafe features&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improve privacy:&lt;/strong&gt; Encrypt more of the handshake&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improve latency:&lt;/strong&gt; Target: 1-RTT handshake for naıve clients; 0-RTT handshake for repeat connections&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Continuity:&lt;/strong&gt; Maintain existing important use cases&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security Assurance:&lt;/strong&gt; Have analysis to support our work (added slightly later)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In order to address objectives (2) and (3) TLS 1.3 adopted
a new handshake skeleton which reverses the order of the DH
key shares, as shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls13-hs.png&quot; alt=&quot;TLS 1.3&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS 1.3 handshake overview
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In TLS 1.3, the client supplies its key share in its first
message (the &lt;code&gt;ClientHello&lt;/code&gt;) and the server responds with
its key share in its first message (&lt;code&gt;ServerHello&lt;/code&gt;). As
a result, the server is able to start encrypting messages
to the client immediately upon receiving the &lt;code&gt;ClientHello&lt;/code&gt;,
starting with its own certificate (thus concealing
the certificate from passive attackers on the wire).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
The client can start encrypting as soon as it gets
the server&#39;s first flight of messages, so after one round
trip, which is an improvement over TLS 1.2 in some situations.&lt;/p&gt;
&lt;p&gt;This handshake flow is inconsistent with static RSA.
Because its the client sends its key share in its first
message, it needs to be able to generate it without
knowing the server&#39;s public key (or key share).
This works fine with Diffie-Hellman (and elliptic curve Diffie-Hellman)
because the key shares are generated independently of
each other, but not
with RSA because in RSA the sender has to use the
recipient&#39;s public key to encrypt.
Moreover, because the public key is in the
certificate, a static RSA-based handshake makes encrypting
the certificate much more difficult, as you need the certificate
in order to learn the public key and hence to establish the encryption key.&lt;/p&gt;
&lt;p&gt;Finally, static RSA is also quite difficult to implement correctly
There have also been a series of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Adaptive_chosen-ciphertext_attack&amp;amp;oldid=1208647187&quot;&gt;adaptive attacks&lt;/a&gt; on the RSA implementations in TLS stacks. The general idea is that
the attacker probes the server over and over by initiating
handshakes and then observing the server&#39;s behavior.
It can use this technique to
gradually learn secret information from the server.
For instance the attacker might take the encrypted PMS from some other
handshake and send variants of the message until it has recovered
the PMS itself.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
These attacks take advantage both of implementation issues
with RSA and of the fact that server uses the same RSA key over and over,
which gives the attacker multiple opportunities to learn small
bits of inforation that add up over time
(another reason why it&#39;s attractive to use a fresh key for each
handshake).&lt;/p&gt;
&lt;h3 id=&quot;aside%3A-forward-secrecy-and-session-resumption&quot;&gt;Aside: Forward Secrecy and Session Resumption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#aside%3A-forward-secrecy-and-session-resumption&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You actually need more than just forward secret key
establishment to make a forward secret protocol. TLS
incorporates a feature called &amp;quot;session resumption&amp;quot;
in which a key established in connection 1 can be
reused in connection 2, thus saving some of the cost
of the key establishment (and authentication). In TLS 1.2, that key is sufficient
to decrypt connection 1, so if you implement
resumption you don&#39;t have forward secrecy as long as
the resumption key sticks around, but in TLS 1.3 they
keys are generated in such a fashion that the key to
connection 2 does not let you decrypt connection 1.&lt;/p&gt;
&lt;p&gt;But wait, there&#39;s more: some stacks implement session resumption
by encrypting the resumption key with a fixed secret
and sending that value to the client as a &amp;quot;ticket&amp;quot;, thus
removing the need for a database. Obviously, as long
as that key is around, you also have a forward secrecy
issue: if the attacker compromises that key then it
can decrypt any tickets it has observed and learn the keys.
The impact on this depends on what TLS 1.3 modes you are using.
TLS 1.3 has a resumption + DHE handshake mode that provides forward
secrecy for resumption while still allowing you to omit the
authentication. In addition, TLS 1.3 also includes a &amp;quot;zero-RTT&amp;quot; mode
in which the resumption key is used to encrypt
the first packet from the client; this doesn&#39;t benefit
from the resumption + DHE handshake mode because it
happens before the DH key establishment.&lt;/p&gt;
&lt;h2 id=&quot;pq-tls&quot;&gt;PQ TLS &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#pq-tls&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As mentioned &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout&quot;&gt;previously&lt;/a&gt;, PQ is being added
to TLS 1.3 by acting as if each PQ algorithm corresponds to
a new elliptic curve (group). However, in reality our PQ key establishment
algorithms are much more like RSA than Diffie-Hellman.
Specifically, they&#39;re &lt;a href=&quot;https://durumcrustulum.com/2024/02/24/how-to-hold-kems/&quot;&gt;Key Encapsulation Mechanisms&lt;/a&gt;, as shown in the figure below.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/kem.png&quot; alt=&quot;KEM Overview&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
KEM overview
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;As with RSA, in a KEM Bob starts by generating a public/private
key pair, &lt;em&gt;(K_pub, K_priv)&lt;/em&gt;. He sends &lt;em&gt;K_pub&lt;/em&gt; to Alice, who then
uses a function called &lt;em&gt;Encap&lt;/em&gt; and some randomness to produce
two values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A shared random &lt;em&gt;secret&lt;/em&gt; value&lt;/li&gt;
&lt;li&gt;An associated &lt;em&gt;ciphertext&lt;/em&gt; value&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;She keeps &lt;em&gt;secret&lt;/em&gt; and sends &lt;em&gt;ciphertext&lt;/em&gt; to Bob, who can then use the
&lt;em&gt;Decap&lt;/em&gt; function and &lt;em&gt;K_priv&lt;/em&gt; to compute &lt;em&gt;secret&lt;/em&gt;; at this point Alice
and Bob both know it.&lt;/p&gt;
&lt;p&gt;Just like with DH, a KEM ends up with both Alice and Bob knowing the secret,
but in DH Alice can generate her key share &lt;em&gt;independently&lt;/em&gt; of Bob
as long as she knows which curve (group) he supports. I.e., it doesn&#39;t
matter who speaks first and—in protocols which support it—Alice&#39;s
key share and Bob&#39;s could actually cross paths. By contrast
with a KEM Alice needs to know Bob&#39;s public key first, which means
that Alice can&#39;t send the &lt;em&gt;ciphertext&lt;/em&gt; until she has received
the first message from Bob. This is fine with TLS 1.2, but in
TLS 1.3 it&#39;s a problem because the client speaks first, so we can&#39;t
use the server&#39;s public key.&lt;/p&gt;
&lt;p&gt;In order to use a KEM with TLS 1.3, we need to reverse the direction
of the KEM, as shown below. The new elements are shown in red and
I&#39;ve omitted the DH elements of the hybrid mode for simplicity.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls13-kem.png&quot; alt=&quot;TLS 1.3 with a KEM&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS 1.3 with a KEM
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;em&gt;client&lt;/em&gt; generates a public/private key pair and sends the
public key to the server in the &lt;code&gt;ClientHello&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;server&lt;/em&gt; sends the &lt;em&gt;ciphertext&lt;/em&gt; to the server in the &lt;code&gt;ServerHello&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;reversing-rsa&quot;&gt;Reversing RSA &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#reversing-rsa&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s actually possible to deploy RSA this way as well by having
the client generate a public key and provide it to the server.
Because RSA encryption is very fast and decryption is much slower, this allows you to offload
work from the server to the client, which is an advantage in
Web scenarios because the clients have to establish far fewer
connections. EC crypto has gotten fast enough that we didn&#39;t
specify this mode for TLS 1.3, but &lt;a href=&quot;https://www.scs.stanford.edu/~dm/home/papers/bittau:tcpcrypt.pdf&quot;&gt;Bittau et al&lt;/a&gt;
used this trick in tcpcrypt.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This allows you to establish a shared secret in a single round trip.
As with DH, the server authenticates to the client by signing the
connection transcript, which includes the &lt;em&gt;ciphertext&lt;/em&gt; value, thus
binding the &lt;em&gt;ciphertext&lt;/em&gt; to the server&#39;s key.
Like DH key establishment, TLS 1.3 key establishment also offers
forward secrecy it the client generates a fresh key pair
for each connection (because the server&#39;s contribution depends on the
client&#39;s key pair, the server automatically generates a fresh value).&lt;/p&gt;
&lt;h3 id=&quot;tls-1.2&quot;&gt;TLS 1.2 &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#tls-1.2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This brings us to the topic of PQ for TLS 1.2.&lt;/p&gt;
&lt;p&gt;If we wanted to add PQ support for TLS 1.2, we would presumably
do more or less the same thing as with TLS 1.3, namely
pretend that the PQ KEM is a elliptic curve group. Just as with
TLS 1.2 DHE mode, this is in the reverse direction from
TLS 1.3, with &lt;em&gt;server&lt;/em&gt; providing the first chunk of keying
material (its public key) and the client generating the ciphertext
and sending it to the server.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls12-kem.png&quot; alt=&quot;TLS 1.2 with a KEM&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS 1.2 with a KEM
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;It&#39;s actually not clear that this would be safe as-is. The reason
is that TLS 1.3 binds the entire handshake transcript to the
resulting key by feeding the transcript into the key schedule
along with the initial cryptographic shared secret. By contrast,
TLS 1.2 only feeds in the random nonces in the &lt;code&gt;ClientHello&lt;/code&gt; and
&lt;code&gt;ServerHello&lt;/code&gt;. The result is that in some circumstances an attacker
can arrange that two connections (e.g., one from the client to
the attacker and one from the attacker to another server) have
the same cryptographic key. This property lead to the &lt;a href=&quot;https://www.mitls.org/pages/attacks/3SHAKE&quot;&gt;Triple Handshake Attack&lt;/a&gt;)
by Bhargavan, Delignat-Lavaud, Fournet, Pironti, and Strub, which was
one of the motivations for the more conservative design of TLS 1.3.&lt;/p&gt;
&lt;p&gt;As Deirdre Connolly &lt;a href=&quot;https://durumcrustulum.com/2024/02/24/how-to-hold-kems/&quot;&gt;describes in detail&lt;/a&gt;,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
KEMs have different properties than ECDHE (in some ways closer to RSA)
and so we&#39;d need to analyze precisely how to integrate them with
TLS 1.2. I&#39;m not saying it can&#39;t be done, but it&#39;s not necessarily
just a simple matter of crossing out &amp;quot;X25519&amp;quot; in the specs and writing in &amp;quot;ML-KEM&amp;quot;.
Adapting TLS 1.3 to ML-KEM also requires some thinking but that
thinking is already happening and is somewhat easier because of
TLS 1.3&#39;s more conservative design.
Obviously the TLS WG could do that work, but the question is whether it&#39;s
worth doing, given that we are trying to transition everyone to TLS 1.3.&lt;/p&gt;
&lt;h2 id=&quot;why-you-might-want-to-do-pq-for-tls-1.2-anyway&quot;&gt;Why you might want to do PQ for TLS 1.2 anyway &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#why-you-might-want-to-do-pq-for-tls-1.2-anyway&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic argument for why you would want to do PQ for TLS 1.2 is that
some people might find it difficult to upgrade their deployments
TLS 1.3 and much easier to upgrade their TLS 1.2 deployments to do
PQ. I&#39;m generally fairly skeptical of these arguments, but I want
to walk through them anyway.&lt;/p&gt;
&lt;h3 id=&quot;sporadically-maintained-deployments&quot;&gt;Sporadically Maintained Deployments &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#sporadically-maintained-deployments&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The broad argument is that there are a lot of environments that aren&#39;t
that actively maintained and so upgrading is difficult in general and
are kind of stuck on TLS 1.2. For
instance, they might be using a TLS library which is updated only for
security issues either because the library vendor updates it
infrequently or because the library consumer is stuck on an old
version.&lt;/p&gt;
&lt;p&gt;Consider the (hypothetical) case of a TLS library which has current
version 2.0 but also has a version 1.1 which is on long-term
support. Version 2.0 supports TLS 1.3 but version 1.1LTS only
supports TLS 1.2. A deployment which is on 1.1LTS might hope
that the vendor would add PQ support to 1.1LTS even though
they weren&#39;t going to upgrade it to support TLS 1.3, and that
upgrading to 1.1.1LTS would be less disruptive than upgrading to
version 2.0.&lt;/p&gt;
&lt;p&gt;This doesn&#39;t apply to the Web which is generally quite
up to date—and which is in the process of transitioning to
TLS 1.3—but there are of course lots of environments
which are much slower to upgrade and arguably might have more
trouble upgrading (&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Peter_Gutmann_(computer_scientist)&amp;amp;oldid=1217519587&quot;&gt;Peter Gutmann&lt;/a&gt; is one of the main advocates
of this view.) I do have some sympathy for this perspective, but
at the end of the day one of the costs of using software is
you have to upgrade it—if only to fix the inevitable vulnerabilities—and
I don&#39;t think it&#39;s unreasonable to expect people to upgrade in
order to get a major change like PQ support rather than
expecting the rest of the world to do a lot of work to make
it slightly easier for them.&lt;/p&gt;
&lt;p&gt;I want to emphasize that this is (almost) exclusively a
software issue; as I said above TLS 1.3 is intended as a
drop-in replacement for TLS 1.2, meaning that in most
cases you should just be able to update your TLS stack,
and get TLS 1.3 as soon as the other side updates.&lt;/p&gt;
&lt;h3 id=&quot;passive-decryption&quot;&gt;Passive Decryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#passive-decryption&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I said, TLS 1.3 is intended to be a drop in replacement for
TLS 1.2. There is, however, one notable and high profile exception,
what&#39;s called
&lt;a href=&quot;https://www.nccoe.nist.gov/addressing-visibility-challenges-tls-13&quot;&gt;TLS
visibility&lt;/a&gt;.
The problem statement goes something like this. Imagine you operate an
encrypted Web server of some kind and you want to monitor traffic
between users and your server. There are a number of reasons you might
want to do this, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Debugging problems with your server.&lt;/li&gt;
&lt;li&gt;Looking for malicious activity (attacks by clients connecting
to the server).&lt;/li&gt;
&lt;li&gt;Measuring the performance of the server on live traffic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s possible to do all of these things by instrumenting the server,
but not all servers have great instrumentation and what if the
server is the source of the problem? Another approach is to capture
the traffic as it goes over the network (e.g., via &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Port_mirroring&amp;amp;oldid=1153786352&quot;&gt;port mirroring&lt;/a&gt;) and then decrypt it using the RSA private key. This
can be done entirely passively (i.e., without interfering
with the connection) and you can decrypt either in real time
or by recording the traffic and then decrypting only the
connections of interest. This has the advantage that you don&#39;t
need to touch the server beyond getting a copy of the private
key and you get to dig as deep as you want into whats going on
without trusting the server.&lt;/p&gt;
&lt;p&gt;However, these techniques don&#39;t work if you are using ephemeral
Diffie-Hellman (whether of the ordinary or EC variety): knowing
the server&#39;s private key allows you to &lt;em&gt;impersonate&lt;/em&gt; the server
but not to decrypt the traffic. Decrypting the traffic requires
the DH private key share, which is usually generated internally by
the server rather than stored on the disk the way that the long
term private key is. Moreover, if the server uses a fresh DH
share for every handshake—which is required for forward
secrecy—then allowing decryption would require somehow
sending the decryption device a copy of every key, which is
obviously a lot more difficult than just a copy of a single
key.&lt;/p&gt;
&lt;p&gt;Although DH establishment became more common, even with TLS 1.2 (see
&lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#key-exchange-modes-over-time&quot;&gt;above&lt;/a&gt;), that didn&#39;t interfere with the
use of passive decryption because servers weren&#39;t required to enable
it.  The TLS key establishment mode as long as there is a significant
population of servers which only do static RSA, clients had to support
static RSA, which meant that servers could insist on it, thus making
allowing this kind of passive decryption to work fine with TLS 1.2.
Of course, those servers wouldn&#39;t be following best security practice
in terms of protecting user traffic, but it was still technically
possible.&lt;/p&gt;
&lt;p&gt;By contrast, because TLS 1.3 doesn&#39;t support static RSA at all, it&#39;s
incompatible with naive passive inspection. Of course servers could
just refuse to negotiate TLS 1.3, but staying on TLS 1.2 forever
isn&#39;t really an answer, especially now that the IETF has decided
not to add new features to TLS 1.2.
When TLS 1.3 was being finalized, a number of
organizations—especially high sensitivity sites like banks or
health insurance companies—raised concerns about losing
this tool, but at the end of the day the TLS working group felt
that forward secrecy was an important security feature and
that re-adding static RSA would have been way too disruptive to
the resulting protocol.&lt;/p&gt;
&lt;p&gt;It &lt;em&gt;is&lt;/em&gt; possible to adapt TLS 1.3 to enable passive decryption
even with Diffie-Hellamn. There are at least three obvious approaches here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Have the server &lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/tls/5LDNgrPI8JTzK6a7r3W5Q-VXlk4/&quot;&gt;re-use the same Diffie-Hellman
key&lt;/a&gt;
share for multiple connections. The server can then save a copy of
the key somewhere (e.g., on disk) and the administrator can send a
copy to the monitoring device.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Have the server send copies of the per-connection keys
(hopefully in some secure fraction) to the monitoring
device, which can use them to decrypt the connections.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Have the server deterministically generate the per-connection DH key shares &lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/tls/lNodPQGh04Hwg7srLmBY6np-puk/&quot;&gt;based on
a static secret and information in the
connection&lt;/a&gt;.
You then provision the monitoring device with the static
secret and it can compute the DH key shares for itself.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;None of these are particularly difficult to implement but they also
require modifying the TLS stack in a way that isn&#39;t required to
provide service to the client but only to provide the ability to
passively decrypt. Moreover, options (2) and (3) also require
specifying exactly how the keys will be transmitted (2) or computed
(3), both of which have the potential to create severe vulnerabilities
(up to perhaps complete compromise of every connection) if they
are done incorrectly.&lt;/p&gt;
&lt;p&gt;It&#39;s really important to understand at this point that what makes passive inspection
work in the first place is basically just due to an idiosyncracy of
the way that static RSA mode works. Specifically, you need to configure
the server with the private key and the private key is also what you
need to decrypt the traffic passively. This means that the administrator
usually already has the credential in hand and can easily transfer it
to the monitoring device without any special affordance by
the server or TLS stack implementor. What we&#39;ve seen over the past
8 or so years is that the implementors are much less enthusiastic
about building special features to enable passive decryption.
So, for instance, BoringSSL and OpenSSL don&#39;t seem to implement any
of them. However, NIST has been running an
&lt;a href=&quot;https://www.nccoe.nist.gov/addressing-visibility-challenges-tls-13&quot;&gt;initiative&lt;/a&gt;
around this, specifying techniques (1) and (2), and it seems
like some big vendors (e.g., F5), are participating. I don&#39;t
know if they are actually planning to ship anything.&lt;/p&gt;
&lt;h4 id=&quot;pq-and-passive-decryption&quot;&gt;PQ and Passive Decryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#pq-and-passive-decryption&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;This brings us to the topic of passive decryption for PQ.
The obvious way to use PQ—just swapping it for DH—is not
really compatible with this kind of passive decryption.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This is most obvious with TLS 1.3: because the server generates its
ciphertext based on the client&#39;s public key, there&#39;s simply no server
private key to provide to the decryption device, because a fresh key
is generated for each connection. Unlike with DH, it&#39;s not even
possible to generate a single static key pair and reuse it (technique
(1) above) because the &lt;code&gt;Encap()&lt;/code&gt; operation depends on the client&#39;s
public key.&lt;/p&gt;
&lt;p&gt;The situation is slightly more complicated with TLS 1.2 because
the server rather than the client generates the private key.
In principle, the server could just generate a single ML-KEM
key and use it indefinitely (similar to approach (1) above),
but, as with approach (1), there is no real reason for the server
to do this other than to enable visibility. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Re-using the same ML-KEM key breaks forward secrecy, so
it&#39;s less secure.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;It&#39;s actually more programming work to remember the ML-KEM
key between transactions rather than just generate a new one,
especially in a multi-threaded system.&lt;/li&gt;
&lt;li&gt;You need to build some mechanism to allow either export
or import of the ML-KEM key, which you wouldn&#39;t otherwise
need.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Moreover, because the most common way to deploy PQ algorithms is
as a hybrid, you&#39;d need to &lt;em&gt;also&lt;/em&gt; do something for DH, which
TLS 1.2 deployments that are set to allow passive decryption
don&#39;t usually currently do now, because they just do static
RSA instead. The bottom line, then, is that it&#39;s not really
significantly easier to support passive decryption (&amp;quot;visibility&amp;quot;)
for TLS 1.2 with PQ than it is for TLS 1.3, so that&#39;s not
really a very good argument for porting PQ into TLS 1.2.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-tls12/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Designing and maintaining cryptographic protocols is a lot of
work, and TLS 1.3 and TLS 1.2 are different enough that it&#39;s
obviously desirable only to maintain one of them even if
TLS 1.3 were no better than TLS 1.2.
As should be clear at this point, it&#39;s technically possible to
add support for PQ to TLS 1.2, but it&#39;s not trivial.
That in and of itself doesn&#39;t mean it&#39;s not worth doing, but
it has to pass the cost/benefit test. For instance, there
might be some important application where it was hard to
swap TLS 1.3 in for TLS 1.2. However, as far as I can tell
that&#39;s not true. While
there are deployments stuck on TLS 1.2, they
should be move to TLS 1.3 without significant
impact on their existing functionality, although it might
involve some inconvenience in terms of software.
It would be better if they
were to do so rather than the IETF community needing to
maintain TLS 1.2 indefinitely.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is not considered good practice in modern systems,
but it&#39;s very convenient in this particular situation. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Of course, active attackers can replay the &lt;code&gt;ClientHello&lt;/code&gt;
with their own key and get the server to encrypt the
certificate to them, but this is more work than
passively snooping. TLS &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-tls-esni&quot;&gt;Encrypted ClientHello&lt;/a&gt; addresses this issue. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is also a version in which the attacker
can extract a signature on a specific value. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See &lt;a href=&quot;https://eprint.iacr.org/2023/1933.pdf&quot;&gt;Cremers, Dax, and Medinger&lt;/a&gt;
for more on how to think about the security properties of KEMs. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, there is actually a proposal called
&lt;a href=&quot;https://datatracker.ietf.org/doc/draft-celi-wiggers-tls-authkem/&quot;&gt;AuthKEM&lt;/a&gt;
that adapts TLS 1.3 to use a handshake more like static
RSA but with KEMs in the place of RSA. However, that
doesn&#39;t change the situation for TLS 1.2, and everyone
assumes that AuthKEM would be run in a forward secret
mode where the client also provided a KEM public key. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In addition, reusing the same key makes remote
side channel attacks on the key (like those we
see with RSA) easier. If you use a different key
for each transaction, then the attacker only has
one chance to learn about it so the side channel
has to leak a lot more information.
 &lt;a href=&quot;https://educatedguesswork.org/posts/pq-tls12/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>How to manage a quantum computing emergency</title>
		<link href="https://educatedguesswork.org/posts/pq-emergency/"/>
		<updated>2024-04-15T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/pq-emergency/</id>
		<content type="html">&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/pq-wall-bandaid.jpg&quot; alt=&quot;Crack in the wall illustration&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Illustration by Kate Hudson with MidJourney and Photoshop AI.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Recently, I &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout&quot;&gt;wrote&lt;/a&gt; about how the Internet
community is working towards post-quantum algorithms in case someone
develops a &lt;em&gt;cryptographically relevant quantum computer
(CRQC)&lt;/em&gt;. That&#39;s still what everyone is hoping for, but nobody really
know when or even if a CRQC is developed, and even in the best case
the transition is going to take a really long time, so what happens if
someone builds a CRQC well in advance of when that transition is
complete?  Clearly, this takes the situation that is somewhere between
&lt;a href=&quot;https://epmonthly.com/article/on-your-mark-get-set-triage/&quot;&gt;non-urgent and urgent to one that is outright emergent&lt;/a&gt;
but that doesn&#39;t mean that all is lost. In this post, I want to
look at what we would do if a CRQC were to appear sooner rather than
later. As with the previous post,
this post primarily focuses on TLS and the Web, though I do
touch on some other protocols.&lt;/p&gt;
&lt;p&gt;Obviously there are a lot of scenarios to consider and &amp;quot;cryptographically
relevant&amp;quot; is doing a lot of work here. For instance, we typically
assume that the strength of X25519 is approximately 2&lt;sup&gt;128&lt;/sup&gt;
bits. A technique which brought the strength down to 2&lt;sup&gt;80&lt;/sup&gt;
would be a pretty big improvement as an attack and would definitely be
&amp;quot;cryptographically relevant&amp;quot; but would also still leave attack quite
expensive; it probably wouldn&#39;t be worth using this kind of CRQC
to attack connections carrying people&#39;s credit cards, especially if
each connection had to be attacked individually, at a cost of
2&lt;sup&gt;80&lt;/sup&gt; operations each time. This would obviously be a strong
incentive to accelerate the PQ transition, but probably
wouldn&#39;t be an outright emergency unless you had particularly
high value communications.&lt;/p&gt;
&lt;p&gt;For the purpose of this post, let&#39;s assume that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;This is a particularly severe attack, bringing the existing
algorithms within range of commercial attackers in a plausible
time frame, whether that&#39;s days or real time.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It happens at some point in the next few years, while
there is significant deployment but by no means
universal deployment of PQ key establishment and minimal
if any deployment of PQ signatures and certificates.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is close to a worst-case scenario in that our existing
cryptography is severely weakened but it&#39;s not practical to just
disable it and switch to PQ algorithms. In other words &lt;strike&gt;isn&#39;t&lt;/strike&gt; it&#39;s
an emergency and &lt;strike&gt;we&lt;/strike&gt; leaves us with a fairly limited set of options.
&lt;em&gt;[Corrected, 2024-04-15]&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;key-establishment&quot;&gt;Key Establishment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#key-establishment&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first order of business is to do something about key
establishment. Obviously if you haven&#39;t already implemented
a PQ-hybrid or pure PQ algorithm, you&#39;ll want to do that ASAP,
selecting whichever one is more widely deployed (or potentially
doing both if some peers do one and some the other).&lt;/p&gt;
&lt;p&gt;Once you&#39;ve added support for some PQ algorithm, the question is
whether you should disable the classical algorithm. The naive answer
is &amp;quot;no&amp;quot;: even if the classical algorithm severely weakened, any
encryption is better than no encryption. In reality, the situation
is a bit more complicated.&lt;/p&gt;
&lt;p&gt;Recall that in TLS, the client proposes a set of algorithms
and the server selects one, as shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-hs-sketch.png&quot; alt=&quot;TLS handshake sketch&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS handshake sketch
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The idea here is that the server gets to see what algorithms
the client supports and pick the best algorithm. As long as
the client and server agree on the algorithm ranking, then
this will generally work fine. However, it&#39;s possible that
the servers and clients will disagree, in which case the
server&#39;s preferences will win.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This actually happened during the transition away from the
RC4 symmetric cipher. After a series of papers showed significant
weaknesses in RC4, the browsers decided they preferred
AES-GCM. Unfortunately, many servers preferred RC4,
and so the result was that even when both clients and servers
supported RC4 and AES-GCM, many servers selected RC4. In
response, browsers (starting with &lt;a href=&quot;https://web.archive.org/web/20170509061141/https://blogs.msdn.microsoft.com/ie/2013/11/12/ie11-automatically-makes-over-40-of-the-web-more-secure-while-making-sure-sites-continue-to-work/&quot;&gt;IE&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
adopted a system in which they first
tried to connect &lt;em&gt;without&lt;/em&gt; offering RC4, and if that failed
they then retried with it, as shown below:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-fallback.png&quot; alt=&quot;TLS fallback to RC4&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
TLS fallback to RC4
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The result was that any server
which supported AES-GCM would negotiate it, but if the server
&lt;em&gt;only&lt;/em&gt; supported RC4, the client could still connect. This
also made it possible to measure the fraction of servers
which supported AES-GCM, thus providing information about
about how practical it was to disable RC4.&lt;/p&gt;
&lt;h3 id=&quot;downgrade-attacks&quot;&gt;Downgrade Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#downgrade-attacks&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;So far we&#39;ve only considered a &lt;em&gt;passive&lt;/em&gt; attacker, but what
about an active attacker? TLS 1.3 is designed so that the
signature from the server protects the handshake, so as
long as the weakest signature algorithm supported by the
client is strong, an active
attacker can&#39;t tamper with the results of the negotiation.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
The fallback system described above weakens this guarantee
a little bit in that the attacker can forge an error and
force the client into the fallback handshake. However, the
client will still offer both algorithms in the fallback
handshake, so the attacker can&#39;t stop the server from picking
&lt;em&gt;its&lt;/em&gt; preferred algorithm; it can just stop the client from
getting the &lt;em&gt;client&#39;s&lt;/em&gt; preferred algorithm by manipulating
the first handshake.&lt;/p&gt;
&lt;p&gt;Of course, if the server&#39;s signature isn&#39;t strong—or
more properly the weakest signature algorithm the client will
accept isn&#39;t strong—then
the the attacker can tamper with the negotiated
key establishment algorithm. However, an attacker who
can do that can just impersonate the server directly,
so it doesn&#39;t matter what key establishment algorithms
the client supports.&lt;/p&gt;
&lt;h3 id=&quot;maybe-it&#39;s-better-to-fail-open&quot;&gt;Maybe it&#39;s better to fail open &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#maybe-it&#39;s-better-to-fail-open&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The bottom line here is that as long as you&#39;re not under
active attack, TLS will deliver the strongest&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; algorithm
that&#39;s jointly supported by the peers, and, if you&#39;re under active attack
by an attacker who can break signature algorithms,
then all bets are off. That&#39;s probably the best you can  do if you&#39;re determined
to connect to the server anyway. But the alternative is,
&lt;em&gt;don&#39;t connect&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The basic question here is how sensitive the communication
with the site is. If you&#39;re just looking up some recipes
or reading the news, then it&#39;s probably not that big
a deal if your connection isn&#39;t secure (in fact, people
used to regularly argue that it wasn&#39;t necessary at all, though
that&#39;s obviously not a position I agree with).
On the other hand, if you&#39;re doing your banking or reading
your e-mail, you probably really don&#39;t want to do that
unencrypted. This isn&#39;t to say that we don&#39;t want ubiquitous
encryption—we do—or that it&#39;s not possible for
even innocuous seeming communications to be sensitive—it is—but
to recognize that this scenario would force us to make some hard
choices about whether we&#39;re willing to communicate insecurely
if that&#39;s the only option. These are hard choices for a human
and even harder for a piece of software like a browser
(it&#39;s much easier for a standalone mail client, obviously).&lt;/p&gt;
&lt;p&gt;This is actually a situation where ubiquitous encryption
makes things rather more difficult. Back when encryption
was rare, it was a reasonable bet that if a site was encrypted
then the operators thought it was particularly sensitive. But now that
everything is encrypted, it&#39;s much harder to distinguish
whether it&#39;s really important for this particular connection
to be protected versus just that it&#39;s good general
practice (which, again, it is!).&lt;/p&gt;
&lt;p&gt;One thing that may not be immediately obvious is that
an insecure connection can threaten not just the data that
you are sending over it, but other data as well. For example,
if you are reading your email, you&#39;re probably authenticating
with either a password (with a normal mail client) or a cookie
(with Webmail). Both of these are just replayable credentials,
so an attacker who can decrypt your connection can impersonate
you to the server and download all your email, not just the
messages you are reading now As discussed above, an attacker who recorded your traffic
in the past might still be able to recover your password, but
this is a lot more work than just getting it off the wire in
real time.&lt;/p&gt;
&lt;h2 id=&quot;signature-algorithms&quot;&gt;Signature Algorithms &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#signature-algorithms&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Of course, none of this does anything to authenticate the server,
which is critical for protecting against active attack. For that we
need the server to have a certificate with a PQ algorithm and the
client to refuse to trust certificates that either (1) are signed with
a classical algorithm or (2) contain keys for a classical
algorithm. Importantly, it&#39;s not enough for the server to stop using a
&lt;strike&gt;PQ&lt;/strike&gt; classical &lt;em&gt;[Fixed 2024-04-15]&lt;/em&gt; certificate, because the server doesn&#39;t have to be part of the
connection at all.  In fact, even if the server doesn&#39;t &lt;em&gt;have&lt;/em&gt; a PQ
certificate, attack is still possible because the attacker can just
forge the entire certificate chain.&lt;/p&gt;
&lt;p&gt;As described in my previous &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout&quot;&gt;post&lt;/a&gt;, the first thing
that has to happen is that servers have to deploy PQ certificates.
Without that, there&#39;s not much the clients can do to defend themselves.
In this case, I would expect there to be a huge amount of pressure to do
that ASAP, despite the serious size overhead issues with PQ certificates
noted by &lt;a href=&quot;https://blog.cloudflare.com/pq-2024&quot;&gt;Bas Westerban&lt;/a&gt; and
&lt;a href=&quot;https://dadrian.io/blog/posts/pqc-signatures-2024/&quot;&gt;David Adrian&lt;/a&gt;.
After all, it&#39;s better to have a slow web site than one that&#39;s
not secure or that people can&#39;t connect to.&lt;/p&gt;
&lt;p&gt;For the same reason, I would expect there to be a lot less concern
about the
&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout#signatures&quot;&gt;availability of &lt;em&gt;hardware security modules (HSMs)&lt;/em&gt; for the new PQ algorithms or whether
the algorithms in question have gone through the entire IETF standards
process&lt;/a&gt; &lt;em&gt;[Added link 2024-04-15]&lt;/em&gt;. Those are both good things, but having PQ safe certificates
is more important, so I would expect the industry to converge
pretty fast on a way forward.&lt;/p&gt;
&lt;p&gt;Once there is some level of PQ deployment, clients can start
distrusting the classical algorithms (before that, there&#39;s not much
point). However, as with key establishment: if the client distrusts
classical algorithms than it won&#39;t be able to connect to any server
that doesn&#39;t have a PQ certificate, which will initially be most
of them, even in the best case. This is frustrating because it
means that you have to choose between failure to connect or having
protection against active attack. What you&#39;d really like is to
have the best protection you can get, i.e.,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Only trust PQ algorithms for sites that have PQ certificates
(so you aren&#39;t subject to active attack).&lt;/li&gt;
&lt;li&gt;Allow classical algorithms for sites without PQ certificates
(so you at least get protection against passive attack).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Actually, there are three categories here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Sites which are so sensitive that you shouldn&#39;t connect to them
without a PQ certificate (e.g., your bank).&lt;/li&gt;
&lt;li&gt;Sites which are known to have a PQ certificate and so you shouldn&#39;t
accept a classical certificate (probably big sites like Google).&lt;/li&gt;
&lt;li&gt;Sites that aren&#39;t that sensitive and so you&#39;d be willing to
connect to them with a classical certificate (e.g., the newspaper).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The problem is being able to distinguish which category a site
falls into. Usually, we don&#39;t try to draw this kind of distinction,
and just let the site tell us if it wants TLS, but this isn&#39;t
a usual situation, so it&#39;s worth exploring some inconvenient things.&lt;/p&gt;
&lt;h3 id=&quot;pq-lock&quot;&gt;PQ Lock &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#pq-lock&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The most obvious thing is to have the client remember when the server
has a PQ certificate and thereafter refuse to accept a classical
certificate. Unfortunately, this idea doesn&#39;t work well as-is,
because server configurations aren&#39;t that stable. For instance:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A site might roll out PQ and then have problems and disable it.&lt;/li&gt;
&lt;li&gt;A site might have multiple servers and gradually roll out
PQ certificates on one of them.&lt;/li&gt;
&lt;li&gt;A site might be served by more than one CDN with different
configurations.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that in cases (2) and (3) the client will not generally
be aware that there are different servers, as they have the
same domain name, and IP addresses aren&#39;t reliable for this
purpose (and, in any case, are likely under control of the attacker
because DNS &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/&quot;&gt;isn&#39;t very secure&lt;/a&gt;).
An in case (1) it&#39;s actually the same server.&lt;/p&gt;
&lt;p&gt;In any of these situations you could have a situation where the client
contacts the server, get a PQ certificate, and then come back
and get a classical certificate, so if the client just forbids
any use of classical after PQ, this would create a lot of failures.
Fortunately, we&#39;ve been in this situation before with the transition
to HTTPS from HTTP, so we know the solution: the server tells
the client &amp;quot;from now on, insist on the new thing,&amp;quot; and the
client remembers that.&lt;/p&gt;
&lt;p&gt;With HTTP/HTTPS, this is a header called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=HTTP_Strict_Transport_Security&amp;amp;oldid=1216355818&quot;&gt;HTTP Strict Transport
Security (HSTS)&lt;/a&gt;
and has the semantics &amp;quot;just do HTTPS from now on with this domain&amp;quot;. It
would be straightforward to introduce a new feature that had the
semantics &amp;quot;just insist on PQ from now on with this domain&amp;quot;. In fact,
the HSTS
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6797&quot;&gt;specification&lt;/a&gt; is
extensible, so if you wanted to also insist on HTTPS (a good idea!),
you could probably just add a new directive saying &amp;quot;also require
PQ&amp;quot;. It would also be easy to add a new HTTP header that said &amp;quot;if you
do HTTPS, require PQ&amp;quot;, as HTTP is nicely extensible and unknown
headers are just ignored.&lt;/p&gt;
&lt;p&gt;One of the obvious problems with an HSTS-like header—and in
fact with HSTS itself—is that it relies on the client at
some point connecting to the server while not under attack. If
the attacker is impersonating the server then they just don&#39;t
send the new header. They can even connect to the real server
and send valid data otherwise, but just strip the header.
This is still a real improvement, though, as the attacker
needs to be much more powerful: if the client is &lt;em&gt;ever&lt;/em&gt; able
to form a secure connection to the true server, then it will
remember that PQ is needed and be protected against attack
from then on, even if it&#39;s not protected from the beginning.&lt;/p&gt;
&lt;h3 id=&quot;preloading&quot;&gt;Preloading &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#preloading&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s possible to protect the user from active attack from the very
beginning by having the client software know in advance which servers
support PQ. There is already something that browsers do with HSTS,
where it&#39;s called &amp;quot;HSTS preloading&amp;quot;.  Chrome operates a
&lt;a href=&quot;https://hstspreload.org/&quot;&gt;site&lt;/a&gt; where server operators can request
that their sites be added to the &amp;quot;HSTS preload list&amp;quot;. The site does
some checking to make sure that the server is properly configured and
then Chrome adds it to their list. In principle, other browsers could
do this themselves, but in practice, I think they all start from
Chrome&#39;s list.&lt;/p&gt;
&lt;p&gt;In principle, we could use a system like this for PQ preloading as
well, but there are scaling issues.  The HSTS preload list is fairly
sizable (~160K entries as of this writing), but this only represents a
small fraction of the domains on the Internet. For example, Let&#39;s
Encrypt is currently issuing certificates for more than 100 million
&lt;a href=&quot;https://letsencrypt.org/stats/&quot;&gt;registered domains&lt;/a&gt; and over 400
million fully qualified domains.  If we assume that sites which have
moved to PQ are aggressive about preloading—which they should be
for security reasons—we could be talking about 10s of millions
of entries. The current Firefox download is about 134 MB, so we&#39;re
probably looking at a nontrivial expansion in the size of a browser
download to carry the entire preload list, even with compact data
structures. On the other hand, it&#39;s probably not totally prohibitive,
especially in the early years when there is likely to not be that
much preloading.&lt;/p&gt;
&lt;p&gt;There may also be ways to avoid downloading the entire database.
For instance, you could use a system like &lt;a href=&quot;https://safebrowsing.google.com/&quot;&gt;Safe Browsing&lt;/a&gt;
which combines an imperfect summary data structure with a query
mechanism, so that you can get offline answers for most sites,
but then will need to check with the server to be sure. The
Safe Browsing database has about 4 million entries—or
at least did back in 2022—so you probably could repurpose
SB-style techniques for something like this, at least until
PQ certificates got a lot more popular.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
The &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy&quot;&gt;privacy properties of SB-style systems aren&#39;t&lt;/a&gt;
as good as just preloading the entire list, so there&#39;s
a tradeoff here, so it would be a matter of figuring out the
best of a set of not-great options.&lt;/p&gt;
&lt;p&gt;Of course, browser vendors don&#39;t need to wait for servers
to ask to be preloaded; they could just add them proactively,
for instance by scanning to see which sites advertise
the PQ-only header, or even which sites just
support PQ algorithms. Obviously there&#39;s some risk of prematurely
recording a site as PQ-only, but there&#39;s also a risk in allowing non-PQ
connections in this situation, The higher the proportion of servers
that support these algorithms, the more aggressive browser
vendors can be about requiring PQ support, and the more
readily they can add servers to the list, even if the
server hasn&#39;t really directly signaled that it wants
to be included.&lt;/p&gt;
&lt;h3 id=&quot;site-categorization&quot;&gt;Site Categorization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#site-categorization&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There are other indicators that can be used to determine
whether a site is especially sensitive and so needs to
be reached over a PQ-secure connection or not at all.
This could happen both browser side or server side
based on a variety of indicia such as
requiring a password or being a medical or financial site.
One could even imagine building some kind of statistical or
machine learning model to determine whether sites were
sensitive. This doesn&#39;t have to be perfect as long as
it&#39;s significantly better than static configuration.&lt;/p&gt;
&lt;h3 id=&quot;reducing-overhead&quot;&gt;Reducing overhead &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#reducing-overhead&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Obviously, we would be in a better position if it weren&#39;t
so expensive to use PQ signature algorithms. Mostly, this
is about the size of the signatures. As noted in Bas&#39;s
post, there are a number of possible options for
reducing the size overhead, these include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/draft-kampanakis-tls-scas-latest/&quot;&gt;Removing known intermediate and root certificates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/draft-jackson-tls-cert-abridge/&quot;&gt;Smart compression of certificates based on a database of known certificates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/draft-davidben-tls-merkle-tree-certs/&quot;&gt;Completely reworking the entire structure of certificates&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of these mechanisms are designed to be be backward compatible,
meaning that the client and the server can detect if they both support
the optimization and use it, but can fall back to the more
traditional mechanisms if not. The first two mechanisms work
with existing WebPKI certificates, and would work with PQ certificates
as well, requiring only that the client and server software be
updated to support the optimization.&lt;/p&gt;
&lt;p&gt;The last mechanism (&amp;quot;Merkle tree certificates&amp;quot;) replaces existing
WebPKI certificates, and so would require servers to get &lt;em&gt;both&lt;/em&gt; a PQ
WebPKI certificate and a PQ Merkle tree certificate, and conditionally
serve the right one depending on the client&#39;s capabilities.
This is obviously more work for the server operator (the same for
the browser user). On the other hand, if server operators are already
going to have to change their processes to get both PQ and classical
certificates, it would be a convenient time to also change to get
a Merkle tree certificate.&lt;/p&gt;
&lt;h3 id=&quot;http-public-key-pinning&quot;&gt;HTTP Public Key Pinning &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#http-public-key-pinning&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Obviously, in addition to recording that the server supported PQ
algorithms you could remember the server&#39;s PQ signature key and insist
that the server present that in the future (this is how SSH works). In
the past the TLS community explored more flexible versions of this
approach with a technique called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=HTTP_Public_Key_Pinning&amp;amp;oldid=1213982013&quot;&gt;HTTP Public Key
Pinning&lt;/a&gt;.
HPKP was eventually retired, in part due to concerns about how easy it
was to render your site totally unusable by pinning the wrong key and
in part because mechanisms like &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2&quot;&gt;Certificate
Transparency&lt;/a&gt; seemed to make it less
important.&lt;/p&gt;
&lt;p&gt;One might imagine resurrecting some variant of HPKP for a PQ transition as a stopgap
during a period where &lt;em&gt;sites&lt;/em&gt; are prepared to deploy PQ but &lt;em&gt;CAs&lt;/em&gt;
can&#39;t issue them yet. This wouldn&#39;t be quite the same because
the server would have to authenticate with its classical certificate
but then pin the PQ key, which would be accepted without a certificate
chain, which HPKP doesn&#39;t support. My sense is that we could probably
manage to get &lt;em&gt;some&lt;/em&gt; issuance of PQ certificates faster than we could
design a new HPKP type mechanism and get it widely deployed, but
it&#39;s probably still an option worth remembering in case we need
it.&lt;/p&gt;
&lt;h2 id=&quot;what-about-tls-1.2%3F&quot;&gt;What about TLS 1.2? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#what-about-tls-1.2%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One challenge with the story I told above is that PQ support is only
available in TLS 1.3, not TLS 1.2.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
This means that anyone who wants
to add PQ support will &lt;em&gt;also&lt;/em&gt; have to upgrade to TLS 1.3. On the
one hand, people will obviously have to upgrade anyway to add the PQ algorithms,
so what&#39;s the big deal. On the other hand, upgrading more stuff
is always harder than upgrading less. After all, the TLS
working group &lt;em&gt;could&lt;/em&gt; define new PQ cipher suites for TLS 1.2,
and it&#39;s an emergency so why not just let use people use TLS 1.2 with PQ
rather than trying to force people to move to TLS 1.3.
On the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=The_Mote_in_God%27s_Eye&amp;amp;action=info&quot;&gt;gripping hand&lt;/a&gt;,
TLS 1.3 is very nearly a drop-in replacement
for TLS 1.2. There is one TLS 1.2 use case that it TLS 1.3
didn&#39;t cover (by design), namely the ability to passively decrypt
connections if you have the server&#39;s private key (sometimes called
&amp;quot;&lt;a href=&quot;https://www.nccoe.nist.gov/addressing-visibility-challenges-tls-13&quot;&gt;visibility&lt;/a&gt;&amp;quot;), which is used for
server side monitoring in some networks. However, this technique won&#39;t work
with PQ key establishment either, so it&#39;s not a regression if you convert
to TLS 1.3.&lt;/p&gt;
&lt;h2 id=&quot;non-tls-systems&quot;&gt;Non-TLS systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#non-tls-systems&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Much of what I&#39;ve written above applies just as well to many other interactive
security protocols such as IPsec or SSH,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
which are designed along essentially the same pattern. Any non-Web interactive
protocol is likely to have an easier time because there will be a fairly
limited number of endpoints you need to connect to, so you can more
readily determine whether the other side has upgraded or not. As a concrete
example, SSH depends on manual configuration of the keys
(the server&#39;s key is usually done on a &amp;quot;trust on first use&amp;quot; basis when the
client initially connects). Once that setup is done, you don&#39;t need
to discover the peer&#39;s capabilities.
By contrast, a Web browser has to be able to connect to any server, including ones it
has no prior information about.&lt;/p&gt;
&lt;p&gt;There is a huge variety of other cryptographic protocols and our ability
to recover from a CRQC would vary a lot. Especially impacted will be anything
which relies on long-term digital signatures, as they are hard to replace.
A good example here is cryptocurrency systems like Bitcoin which rely on
signatures to effect the transfer of tokens: if I can forge a signature
from you then I can steal your money. The right defense against this is to
replace your classical key with a PQ key (effectively to transfer money
to yourself), but we can assume that a lot of people won&#39;t do that in
time, and as soon as a CRQC is available, any future transaction becomes
questionable.&lt;/p&gt;
&lt;p&gt;The situation around Bitcoin seems to actually be pretty interesting. The modern way to do Bitcoin
transfers is to transfer them not to a public key but the hash of a public
key (called &lt;em&gt;pay to public key hash (p2pkh)&lt;/em&gt;). As long as the public
key isn&#39;t revealed, then you can&#39;t use a quantum computer to forge a signature.
The public key has to be revealed in order to transfer the coin, but if you
don&#39;t reuse the key, then there is only a narrow window of vulnerability
between the signature and when the payment is incorporated into the blockchain
(which doesn&#39;t depend on public key cryptography).
However, according to this &lt;a href=&quot;https://www2.deloitte.com/nl/nl/pages/innovatie/artikelen/quantum-computers-and-the-bitcoin-blockchain.html&quot;&gt;study by Deloitte&lt;/a&gt;, about 25% of Bitcoins are vulnerable to a CRQC,
so that&#39;s not a great situation.&lt;/p&gt;
&lt;h2 id=&quot;what-if-the-pq-algorithms-aren&#39;t-secure%3F&quot;&gt;What if the PQ algorithms aren&#39;t secure? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#what-if-the-pq-algorithms-aren&#39;t-secure%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;All of the above assumes that we have public key algorithms that
are in fact secure against both classical and quantum computers.
In that case, our problem is &amp;quot;just&amp;quot; transitioning from our
insecure classical algorithms to their more-or-less interface
compatible PQ replacements. But what happens if those algorithms
turn out to be
&lt;strike&gt;secure&lt;/strike&gt;
insecure &lt;em&gt;[Corrected 2024-04-15]&lt;/em&gt;
after all. In that case we are in truly
deep trouble. Obviously the world got on OK for centuries without
public key cryptography, but now we have an enormous ecosystem
based on public key cryptography that would be rendered insecure.&lt;/p&gt;
&lt;p&gt;Some of those applications may just get abandoned (maybe we don&#39;t
&lt;em&gt;really&lt;/em&gt; need cryptocurrencies...) but it would obviously be very bad if
nobody was able to safely buy anything on Amazon, use Google docs, or
that your health care records couldn&#39;t be transmitted securely,
so there&#39;s obviously going to be a lot of incentive to do &lt;em&gt;something&lt;/em&gt;.
The options are pretty thin, though.&lt;/p&gt;
&lt;h3 id=&quot;signature&quot;&gt;Signature &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#signature&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We &lt;em&gt;do&lt;/em&gt; have at least one signature algorithm which we have reasonably
high confidence is secure: &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Hash-based_cryptography&amp;amp;oldid=1207336500&quot;&gt;hash signatures&lt;/a&gt;,
which NIST is standardizing as &amp;quot;SLH-DSA&amp;quot;. Unfortunately,
the performance is extremely bad (we&#39;re talking 8KB
signatures). On the other hand, slow and big signature
algorithms are better than no signature algorithms at
all, so there are probably some applications where
we&#39;d see some use of SLH-DSA.&lt;/p&gt;
&lt;h3 id=&quot;key-establishment-2&quot;&gt;Key Establishment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#key-establishment-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;While the signature story is bad, but the key establishment story is
really dire. The main option people seem to be considering is some
variant of what I&#39;ve been calling &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/&quot;&gt;intergalactic Kerberos&lt;/a&gt;.
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Kerberos_(protocol)&amp;amp;oldid=1218948629&quot;&gt;Kerberos&lt;/a&gt; is
a security protocol designed at MIT back in the 80s and in its
original form works by having each endpoint (user, server) share
a pairwise &lt;em&gt;symmetric&lt;/em&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
key with a &lt;em&gt;key distribution server (KDC)&lt;/em&gt;.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/kerberos.png&quot; alt=&quot;Kerberos sketch&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
A high level view of Kerberos
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;At a high level, when Alice wants to talk to Bob, she contacts the KDC using a message
encrypted with her pairwise key K_a and tells it that it wants to
contact Bob. The KDC creates a new random key R_ab and then sends
Alice two values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;R_ab&lt;/li&gt;
&lt;li&gt;A copy of R_ab encrypted under Bob&#39;s key (K_b), i.e.,
E(K_b, {Alice, K_ab}). In Kerberos terms this is called a &amp;quot;ticket&amp;quot;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Alice can then contact Bob and present the ticket. Bob decrypts the ticket
and recovers K_ab. Now Alice and Bob share a key they can use to
communicate. Note that this all uses symmetric cryptography, so it&#39;s
not vulnerable to attacks on our PQ algorithms.
You can wire up this kind of key establishment mechanism into protocols
like TLS (TLS 1.2 actually has Kerberos integration, but it
wasn&#39;t ported into TLS 1.3) and use them in something approximating the
usual fashion, albeit in a much clunkier fashion.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;merkle-puzzle-boxes&quot;&gt;Merkle Puzzle Boxes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#merkle-puzzle-boxes&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It turns out that there actually sort of is a public key system
that doesn&#39;t depend on any fancy math and so we can have
reasonable confidence in how secure it is.
In fact, it&#39;s the original
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Merkle%27s_Puzzles&amp;amp;oldid=1208543827&quot;&gt;public key system, invented by Ralph Merkle&lt;/a&gt;.
This post is already pretty long, so if you&#39;re interested
check out the Wikipedia page. The TL;DR is that it&#39;s probably
not that practical because (1) public key sizes are enormous
and (2) it only offers the defender a quadratic level of security (if
the defender does work &lt;em&gt;N&lt;/em&gt; the attacker does work &lt;em&gt;N&lt;sup&gt;2&lt;/sup&gt;&lt;/em&gt; to break it),
which isn&#39;t anywhere near as good as other algorithms. There
seem to be some quantum attacks on puzzle boxes (though I&#39;m not
sure how good they are in practice), but there is also a &lt;a href=&quot;https://arxiv.org/abs/1108.2316v1&quot;&gt;PQ variant&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This kind of design has a number of challenges. First, it&#39;s
much harder to manage. In a public-key based system clients
don&#39;t need to have any direct relationship with the CA,
because they just need the CA&#39;s public key. In a symmetric
key system, however, each client needs a relationship with
the KDC in order to establish the shared key. This is obviously
a huge operational challenge.&lt;/p&gt;
&lt;p&gt;The basic challenge with this kind of design is that the KDC
is able to decrypt K_ab and hence any traffic between Alice
and Bob. This is because the KDC is providing &lt;em&gt;both&lt;/em&gt; authentication
and key establishment, unlike with a public key system like the
WebPKI where the CA provides authentication but the endpoints
perform key establishment using asymmetric algorithms. This
is just an inherent property of symmetric-only systems, and
it&#39;s what we&#39;re reduced to if we don&#39;t have any CRQC-safe
asymmetric algorithms.&lt;/p&gt;
&lt;p&gt;One potential mitigation is to have multiple KDCs and then
Alice and Bob use a key derived from exchanges with those
KDCs. In such a system, the attacker would need to compromise
all of the KDCs in use for a connection in order to either
(1) impersonate one of the endpoints or (2) decrypt traffic.
Recently we&#39;ve started to see some interest in symmetric key type
solutions along these lines, including a
&lt;a href=&quot;https://datatracker.ietf.org/doc/bofreq-aelmans-symmetric-key-exchange-skex/&quot;&gt;draft&lt;/a&gt;
at the IETF and a recent &lt;a href=&quot;https://www.imperialviolet.org/2024/04/07/letskerberos.html&quot;&gt;blog
post&lt;/a&gt; by
Adam Langley.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
My sense is that due to the drawbacks mentions above,
this kind of system isn&#39;t likely to take off as long as we have
PQ algorithms, even if they&#39;re not that efficient. However, if
the worst happens and we don&#39;t have asymmetric PQ algorithms at all,
we&#39;re going to have to do something, and symmetric-based systems
will be one of the options on the table.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-emergency/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I mentioned in the previous post, we shouldn&#39;t expect the PQ
transition to happen very quickly, both because the algorithms
aren&#39;t all that we&#39;d like and because even with better algorithms
the transition is very disruptive. However,
because the Internet is so dependent on cryptography and in particular
public key cryptography, there would be enormous demand to do &lt;em&gt;something&lt;/em&gt;
if a CRQC were to be developed any time soon.
When compared to the alternative of no secure
communications at all, a lot of options that we would have
previously considered unattractive or even totally non-viable
would suddenly look a lot better, and I would expect the
industry to have to make a lot of tough choices to get anything
at all to work while we worked out what to do in the long term.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This distinction does matter for some attacks, but even if it&#39;s days,
the situation is really bad. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Unless the server decides to defer to the client, of course. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Thanks to David Benjamin for help with the history of this
technique. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is a new feature of TLS 1.3. In TLS 1.2, the security
of the handshake depended on the weakest common key establishment
algorithm, which left it vulnerable to attacks if the
weakest algorithm was breakable in real-time. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Again
with the caveats above about preferences &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The worst case is when about 1/2 of the sites want to
be preloaded; once you get to well over 50%, you can
instead publish the list of non-preloaded sites, though
this is logistically a bit trickier, as you&#39;d need a
list of every site. You can get this list from Certificate
Transparency, though, which is what &lt;a href=&quot;https://ieeexplore.ieee.org/document/7958597&quot;&gt;CRLite&lt;/a&gt; does. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Obviously, there&#39;s an element of &amp;quot;we&#39;re trying to avoid maintaining
TLS 1.2 and we want people to upgrade&amp;quot; going on here, but there&#39;s
also a small technical advantage here: although TLS
1.2 and TLS 1.3 both authenticate the server by having the server sign
something, in TLS 1.2 the signature only covers part of the handshake
(specifically, the random nonces and the server&#39;s key), which means
that the signature doesn&#39;t cover the key establishment algorithm
negotiation. This means that an attacker who can break the weakest
joint key establishment algorithm can mount a downgrade attack,
forcing you back to that weakest algorithm. However, we could
presumably address this by remembering that both key establishment
and authentication are &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#pq-lock&quot;&gt;PQ only&lt;/a&gt;.
 &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note: QUIC uses the TLS 1.3 handshake under the hood, so it has roughly the
same properties as TLS 1.3 &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In original Kerberos, a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Data_Encryption_Standard&amp;amp;oldid=1218933490&quot;&gt;DES&lt;/a&gt; key. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Langley&#39;s design actually assumes that PQ algorithms work
but are too inefficient to use all the time, so you use it
to bootstrap the symmetric keys with the KDC. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-emergency/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Design choices for post-quantum TLS</title>
		<link href="https://educatedguesswork.org/posts/pq-rollout/"/>
		<updated>2024-03-30T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/pq-rollout/</id>
		<content type="html">&lt;p&gt;It&#39;s a cruel irony that just as encryption is finally becoming ubiquitous,
quantum computers threaten to tear it all down.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/firefox-https-usage.png&quot; alt=&quot;Firefox HTTPS deployment&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Firefox HTTPS usage
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The technical details aren&#39;t that important (see &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security&quot;&gt;here&lt;/a&gt; for
some background), but the TL;DR version is that many of our cryptographic algorithms
are designed to be difficult to break using &amp;quot;classical&amp;quot; computers
(which is to say the kind we have now) but may not be difficult to
break if you have a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Quantum_computing&amp;amp;oldid=1213895774&quot;&gt;quantum computer&lt;/a&gt;,
which takes advantage of quantum mechanical effects,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
then it might be possible to efficiently break these algorithms.&lt;/p&gt;
&lt;p&gt;I say &lt;em&gt;might&lt;/em&gt; because the situation is somewhat uncertain in that
while people have built quantum computers, they are currently quite
small, nowhere near what you would need to mount an attack on
a modern cryptographic algorithm. There&#39;s a lot of money
being invested in developing quantum computers, but nobody
really knows when we&#39;ll have what&#39;s called a &lt;em&gt;cryptographically
relevant quantum computer (CRQC)&lt;/em&gt;, which is to say one which
could mount practical attacks on the cryptosystems in wide use,
or whether it&#39;s possible to build one at all.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;There was the time Blueshell had a humor fit at Pham’s faith in public key encryption, and Ravna knew some stories of her own to illustrate the Rider’s opinion.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;— Vernor Vinge, &amp;quot;A Fire Upon The Deep&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;However, if a CRQC were to exist, the impact would be catastrophic,
potentially rendering nearly every existing use of cryptography
insecure. Specifically, it would break the &amp;quot;asymmetric&amp;quot; algorithms
we use to authenticate other Internet users and to establish
cryptographic keys, so an attacker would be able to impersonate
anyone and/or recover the keys used to encrypt data. A CRQC
probably won&#39;t have that big an impact on the actual &amp;quot;symmetric&amp;quot;
encryption used to encrypt the data itself, but if you have the
key you can just decrypt it with a regular computer, so that&#39;s
not much in the way of comfort.&lt;/p&gt;
&lt;p&gt;For that reason, researchers have started developing what&#39;s
often called &lt;em&gt;post-quantum (PQ)&lt;/em&gt; cryptographic algorithms which are
designed to resist attack by quantum computers, or more properly,
for which there are no known quantum algorithms which would allow
you to break them (which isn&#39;t to say that those algorithms don&#39;t
exist). After a fairly long competition, NIST published new
standards for post-quantum key establishment (&lt;a href=&quot;https://csrc.nist.gov/pubs/fips/203/ipd&quot;&gt;ML-KEM&lt;/a&gt;)
and digital signature (&lt;a href=&quot;https://csrc.nist.gov/pubs/fips/204/ipd&quot;&gt;ML-DSA&lt;/a&gt;)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
and protocol designers and implementors are starting to look at how to
adapt their protocols to use them.&lt;/p&gt;
&lt;p&gt;In this post, I want to look at the challenges around that transition,
focusing on the situation for TLS and the WebPKI, though some of the
same concerns apply to other settings.&lt;/p&gt;
&lt;h2 id=&quot;why-not-just-convert-right-now%3F&quot;&gt;Why not just convert right now? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#why-not-just-convert-right-now%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The obvious question is why not just convert now, as we did when changing
from older algorithms like RSA to newer ones based on elliptic curves
(EC). The reason is that the new PQ algorithms are not clearly better
than the EC algorithms that dominate the space now. Specifically:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;In many cases, performance is worse,&lt;/dt&gt;
&lt;dd&gt;in terms of CPU,
key, ciphertext, or signature size. For instance, ML-KEM is faster
than X25519 (the most popular current EC key establishment
algorithm) but the keys are much bigger, over 1000 bytes compared
to 32 bytes. The situation is much worse for signatures, where
there really isn&#39;t any standardized
algorithm which isn&#39;t a big regression
from EC-based signatures in one way or another, and due to the
large number of signatures that need to be carried
in a typical protocol exchange, the size issue is a big deal,
especially as there appear to be compatibility issues.
These posts
by &lt;a href=&quot;https://blog.cloudflare.com/pq-2024&quot;&gt;Bas Westerban from
Cloudflare&lt;/a&gt; and &lt;a href=&quot;https://dadrian.io/blog/posts/pqc-signatures-2024/&quot;&gt;David Adrian
from Chrome&lt;/a&gt;
does a good job of covering
the state of play of the various algorithms, but in general
none of them has a better overall performance profile than EC.&lt;/dd&gt;
&lt;dt&gt;We&#39;re not sure that they&#39;re secure.&lt;/dt&gt;
&lt;dd&gt;There has been quite a bit of security analysis on the particular EC
variants that are in wide use and while there has been a lot of work
on the problems underlying ML-KEM and ML-DSA, my understanding is that
there is still real uncertainty about how secure these systems are
against classical computers. Daniel J. Bernstein (DJB) has been one of
the biggest &lt;a href=&quot;https://blog.cr.yp.to/20240102-hybrid.html&quot;&gt;advocates for this
view&lt;/a&gt;, but much of the
industry is sort of antsy about the PQ algorithms.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/dd&gt;
&lt;dt&gt;We don&#39;t know if or even when we&#39;ll get a CRQC.&lt;/dt&gt;
&lt;dd&gt;Current quantum computers are very far away from being
cryptographically relevant and of course progress is hard
to predict. The Global Risk Institute has produced a &lt;a href=&quot;https://globalriskinstitute.org/publication/2023-quantum-threat-timeline-report/&quot;&gt;report&lt;/a&gt;
with estimates on when we will have a CRQC, with the results
shown below:&lt;/dd&gt;
&lt;/dl&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://storage.googleapis.com/bughunters-article-images/blogs/pqc_estimate_01.png&quot; alt=&quot;Global Risk Estimates for a CRQC&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Global Risk Institute Estimates for a CRQC
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For these reasons, industry has generally been pretty cautious about
rolling out PQ algorithms.&lt;/p&gt;
&lt;h2 id=&quot;threat-model&quot;&gt;Threat Model &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#threat-model&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Because classical key establishment and digital signature are based on the
same underling math problems, the impact of a CRQC on these
algorithms is also the same—which is to say very bad. However,
the security impact is very different.&lt;/p&gt;
&lt;h3 id=&quot;key-establishment&quot;&gt;Key Establishment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#key-establishment&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;When you encrypt traffic, you want that traffic to remain secret
for the valuable lifetime of the data. For instance, if you
are encrypting your credit card number, you want it to remain secret
as long as that credit card is still valid. Lots of information
has very long lifetimes during which people want it to remain
secret; presumably you wouldn&#39;t be happy with people learning
your medical history 6 months from now.&lt;/p&gt;
&lt;p&gt;When you encrypt traffic using keys derived via an asymmetric key-based
establishment protocol—as with TLS—this means that you
need that key establishment algorithm to also be secure for the lifetime
of the data. In this context, that means that data that is being
sent now using keys established with EC algorithms—which is
to say most of it—might be revealed in the future if someone
develops a CRQC. An attacker might even deliberately capture
a lot of traffic on the Internet, betting that eventually a
CRQC will be developed and they can decrypt it (this is called
a &amp;quot;harvest now, decrypt later&amp;quot; attack).&lt;/p&gt;
&lt;p&gt;For this reason, doing something about the threat of a CRQC
to the security of key establishment is a fairly high priority,
because every day that you use non-PQ algorithms you&#39;re adding
to the pile of data that might eventually be decryptable. This is
especially true because transitions can take a really long time
even in the best case. For example,
TLS 1.2 first added support for modern AEAD algorithms such
as AES-GCM in 2008, but Firefox and Chrome didn&#39;t even
add support for TLS 1.2 until &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=861266&quot;&gt;2013&lt;/a&gt;,
and AEAD cipher suites didn&#39;t outnumber the older CBC-based
ciphers until 2015. So, even in the best case, we&#39;re still
going to be sending a lot of non quantum safe traffic
for years to come.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-cipher-usage.png&quot; alt=&quot;TLS cipher usage&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Negotiated cipher suites over time. From &lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/3278532.3278568&quot;&gt;Kotzias et al., 2018&lt;/a&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&quot;digital-signature&quot;&gt;Digital Signature &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#digital-signature&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;By contrast, a digital signature algorithm only needs to be secure at
the time you make decisions based on the validity of the signature; in
TLS this is at the time the connection is established. If a CRQC that
breaks your signature algorithm is developed 30 second after your TLS
connection is established, your data remains secure as long as you
established a key using some non-vulnerable method (of course, your
next connection won&#39;t be secure, so you&#39;ll want to do something
about that).&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;signatures-for-object-security&quot;&gt;Signatures for Object Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#signatures-for-object-security&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Note that the situation is different for signatures in object-based
protocols like e-mail, because people want to be able to validate the
signature long after the message was sent. Thus, having a PQ signature
does help, even if paired with a classical signature, because it
allows the signature to survive subsequent development of a CRQC.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible to allow a classical algorithm to survive the
development of a CRQC by &lt;em&gt;timestamping&lt;/em&gt; the signature to demonstrate
that the classical signature was created prior to the development
of the CRQC. For instance, you could arrange to register a hash
of the signed document with some blockchain type system. You
can then present the signed document paired with the timestamp
proof (note that the timestamp service doesn&#39;t need to verify
the signature itself; it&#39;s just vouching that it saw the document at
time X.).
The relying party can verify that the signature was made
prior to the development of the CRQC, in which case it is presumably
trustworthy.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For this reason, doing something about digital signatures is generally
considered to be a lower priority, although of course it will be
really inconvenient if a CRQC is built and we have no deployment of any PQ signature
algorithms, as everyone will be scrambling to catch up. It&#39;s
of course possible that someone—most likely some sort of
nation state intelligence agency—already has a CRQC and isn&#39;t
telling, but even then that&#39;s a lot less bad than having your
communications be vulnerable to anyone who can get a QC shipped to them
overnight as long as they have Amazon Prime.&lt;/p&gt;
&lt;p&gt;This asymmetry in the threat model is convenient, because, as noted
above, nobody is that excited about the PQ signature algorithms,
whereas the PQ key establishment algorithms seem fairly
reasonable—assuming of course that they&#39;re secure. As a result,
people are focusing on key establishment and mostly keeping their
fingers crossed that the signature situation will improve before
it becomes an emergency.&lt;/p&gt;
&lt;h2 id=&quot;cryptographic-algorithms-in-transport-security-protocols&quot;&gt;Cryptographic Algorithms in Transport Security Protocols &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#cryptographic-algorithms-in-transport-security-protocols&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For the purpose of this post, I want to focus on transport security
protocols like TLS. These aren&#39;t the only kind of cryptographic
protocols in the world, but they illustrate a lot of the issues
at play, in particular how we transition from one set of algorithms
to another.&lt;/p&gt;
&lt;p&gt;It&#39;s clearly impractical to just wholesale switch over from
the old algorithms to the new algorithms at some point in time
(what&#39;s often called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Flag_day_(computing)&amp;amp;oldid=1195237556&quot;&gt;&amp;quot;flag day&amp;quot;&lt;/a&gt;).
It took years (decades, really) to deploy everything we
have in the ecosystem and any big change will also take time.
Instead, TLS—and most similar protocols—are explicitly designed
to have what&#39;s called &lt;em&gt;algorithm agility&lt;/em&gt;, the ability
to support more than one algorithm at once so that
endpoints can talk to both old and new peers, thus facilitating
a gradual transition from old to new.&lt;/p&gt;
&lt;p&gt;The diagram below provides a stylized version of the TLS handshake.
The client sends the first message (&lt;code&gt;ClientHello&lt;/code&gt;), which contains a
set of &amp;quot;key shares&amp;quot;, one for each key establishment algorithm that it
supports.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
For Elliptic Curve algorithms, this means one key share for
each curve. When the server responds with its &lt;code&gt;ServerHello&lt;/code&gt; message,
it will pick one of those groups and send its own key share
with a key from the same group. Each side can then combine its
key share with the other side&#39;s key share to produce a
secret key that both sides know.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
This shared key can then be used to derive keys to protect
the application data traffic.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tls-hs-sketch.png&quot; alt=&quot;TLS Handshake Sketch&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Something kind of like the TLS handshake
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Of course, we also need to authenticate the server.
This happens by having the server present a certificate and then
signing the handshake transcript (the messages sent by each
side) using the private key corresponding to the public key
in its certificate. But as noted above, there are multiple signature
algorithms, so the &lt;code&gt;ClientHello&lt;/code&gt; tells the server which signature
algorithms the client supports so that it can pick an appropriate
certificate. Of course, if the server doesn&#39;t have a certificate
that matches any of the client&#39;s algorithms, then the client
and server will not be able to communicate.&lt;/p&gt;
&lt;p&gt;Note that there are actually several signatures here because the
certificate both has a key for the server and is signed by
some key owned by the CA. These keys may have different algorithms,
and both have to be in the list advertised by the client. Moreover,
the CA may have its own certificate and that signature also has
to use an appropriate algorithm and then there are
&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2#signed-certificate-timestamps&quot;&gt;CT SCTs&lt;/a&gt;
(I refer you again to &lt;a href=&quot;https://dadrian.io/blog/posts/pqc-signatures-2024/&quot;&gt;David Adrian&#39;s post&lt;/a&gt;,
which quantifies these).&lt;/p&gt;
&lt;p&gt;Post-quantum algorithms fit neatly into this structure.
Each PQ algorithm is treated like a new elliptic curve (even though
they really don&#39;t have anything in common cryptographically)
and signature algorithms just act the same (although, as noted
above, the result is a lot larger). Even better, all of the
generation and selection of key shares is done internally to
the TLS stack,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
so it&#39;s possible to roll out new key establishment algorithms
just by updating your software without any action on the user&#39;s
part (this is how EC was deployed in the first place). Of course,
this is a lot easier if your software is remotely updatable
or at least updates regularly; if we&#39;re talking about the
software in a lightbulb, the situation might be a &lt;strong&gt;lot&lt;/strong&gt; worse.&lt;/p&gt;
&lt;p&gt;By contrast,
in order to deploy a new signature algorithm you need a new
certificate, and even though certificate deployment is partly
automated now, it&#39;s not so automated that people expect new
signature algorithms and the corresponding certificates to just
pop up in their servers. Moreover, some servers are not
set up to have multiple certificates in parallel.
Given these deployment realities,
the performance gap, and the
threat model difference mentioned above, it shouldn&#39;t be surprising
that there&#39;s a lot more activity around deploying PQ key
establishment than around signatures.&lt;/p&gt;
&lt;h2 id=&quot;the-current-deployment-situation&quot;&gt;The Current Deployment Situation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#the-current-deployment-situation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In the past few years, we have seen a number of experimental
deployments of PQ algorithms, primarily for key establishment.&lt;/p&gt;
&lt;h3 id=&quot;key-establishment-2&quot;&gt;Key Establishment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#key-establishment-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Most of the key establishment deployment has been in what&#39;s called a &amp;quot;hybrid&amp;quot; mode,
which is to say using two key establishment algorithms in parallel.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A classical EC algorithm like X25519&lt;/li&gt;
&lt;li&gt;A PQ algorithm like ML-KEM&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For instance, Chrome recently &lt;a href=&quot;https://groups.google.com/a/chromium.org/g/blink-dev/c/6xfaov3Z4yo&quot;&gt;announced&lt;/a&gt;
shipping an X25519/Kyber-768 (Kyber is the original name for what
&lt;strike&gt;is now ML-KEM&lt;/strike&gt; became ML-KEM after some modifications
&lt;em&gt;[Updated 2024-03-30]&lt;/em&gt;) hybrid and Firefox is working on it &lt;a href=&quot;https://github.com/mozilla/standards-positions/issues/874&quot;&gt;as well&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The way that these hybrid schemes work is that you send key shares
for &lt;em&gt;both&lt;/em&gt; algorithms, then compute shared keys for both, and
finally combine the shared keys into the overall cryptographic
key schedule that you use to derive the keys used to encrypt
the traffic.
There are a number of ways to do this, but the
way it&#39;s done in TLS 1.3 is simple: you just invent a new
algorithm identifier for the pair of classical and post-quantum
algorithms and the key share is the pair of keys. Similarly,
the combined algorithm emits a new secret that is formed
by combining the secrets from the individual algorithms.
This works well with the modular design of TLS, because it
just looks like you&#39;ve defined a new elliptic curve
algorithm, and the rest of the TLS stack doesn&#39;t need to
know any better.&lt;/p&gt;
&lt;p&gt;The advantage of a hybrid design like this is that—assuming
it&#39;s done right—it is resistant to a failure of either
algorithm; as long as one of the two algorithms is secure
then the resulting key will be secret from the attacker and
the resulting protocol will be secure. This allows you
to buy some fairly cheap insurance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If someone develops a CRQC the connection is still protected
by the PQ algorithm.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If it turns out that the PQ algorithm is weak after all,
then the traffic is still protected with the classical
algorithm.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, if the PQ algorithm &lt;em&gt;is&lt;/em&gt; broken, then the traffic isn&#39;t
protected in the event that someone develops a CRQC, but at least
we&#39;re not in any worse shape than we were before, except for the
additional cost of the &lt;strike&gt;PQ&lt;/strike&gt; classical &lt;em&gt;[Updated 2024-03-30. oops.]&lt;/em&gt; algorithm, which, as noted above, is
comparatively low.&lt;/p&gt;
&lt;p&gt;All of this makes a rollout fairly easy: clients and servers
can independently add support for PQ hybrids to their implementations
and configure their clients to prefer them to the classical
algorithms. When two PQ-supporting implementations try to
connect to each other, they&#39;ll negotiate the hybrid algorithm and
otherwise you just get the classical algorithm. Initially,
this means that there will be very little use of hybrid algorithms,
but as the updated implementations are more widely deployed, you&#39;ll
have more and more use of hybrid algorithms until eventually
most traffic will be protected against a CRQC. This is the same
process we historically used to roll out new TLS cipher suites as
well as new versions of TLS, like TLS 1.3.&lt;/p&gt;
&lt;p&gt;Of course, it won&#39;t be safe for clients or servers to disable
support for the classical algorithms until effectively all
peers have support for the PQ hybrids; if you disable support
for them too early, then you won&#39;t be able to talk to anyone
who hasn&#39;t upgraded, which is obviously bad. For many applications,
this is a well-contained problem: for instance you can disable
classical algorithms in your mail client as soon as your mail
server supports the PQ hybrids.
However, the Web is a special case because a browser has to be able to
talk to any server and a server needs to be able to talk to any
browser, so Web clients and servers are typically very conservative
about when they disable algorithms. The standard procedure is to offer
both new and old concurrently and then measure the level of deployment
of the new algorithm and only disable the old algorithm when there are
almost no peers who won&#39;t support the new algorithm.  Unless there is
some strong sign that CRQC is imminent, I would expect there to be a
very long tail of clients and servers—especially
servers—that don&#39;t support PQ hybrids, in part because PQ hybrid
support is not present in TLS 1.2 but only TLS 1.3, and there are
still quite a few TLS 1.2 only servers. This will also make it
hard for browsers to disable the classical algorithms, even if they
want to.&lt;/p&gt;
&lt;p&gt;If a viable CRQC is developed, then it will be necessary
for everyone else to switch over to post-quantum key
establishment algorithms
on an expedited basis, but that&#39;s not enough. If you accept classical algorithms for authentication,
the attacker will be able to impersonate the server. This means
that &lt;em&gt;after&lt;/em&gt; the CRQC exists, you will also need
to have everyone switch to PQ signature algorithms.&lt;/p&gt;
&lt;h3 id=&quot;signature&quot;&gt;Signature &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#signature&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;By contrast, there has been very little deployment of PQ algorithms
for signature, largely for the reasons listed above, namely that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s a lot harder to deploy a new signature algorithm than
a new key establishment algorithm.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It feels less urgent because a future CRQC mostly affects future
connections rather than current ones.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The signature algorithms aren&#39;t that great. And by &amp;quot;not that great&amp;quot;
I mean that replacing our current algorithms with
ML-DSA would result in adding over
&lt;a href=&quot;https://dadrian.io/blog/posts/pqc-signatures-2024/&quot;&gt;14K of signatures and public keys&lt;/a&gt;
to the TLS handshake. As a comparison point, I just tried
a TLS connection to &lt;code&gt;google.com&lt;/code&gt; and the server sent
4297 bytes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Before we can have any deployment, we first need to update the
standards for signature algorithms for WebPKI certificates.
From a technical perspective, this is fairly straightforward
(aside from the performance and size issues associated with
the certificates) in that you just assign code points
for the signature algorithms. However, unlike the situation
with key establishment, this is just the start of the process.&lt;/p&gt;
&lt;p&gt;On the Web, certificate authority practices are in part governed by a set of rules
(the &lt;a href=&quot;https://cabforum.org/working-groups/server/baseline-requirements/documents/&quot;&gt;&lt;em&gt;baseline
requirements (BRs)&lt;/em&gt;&lt;/a&gt;)
managed by the &lt;a href=&quot;https://cabforum.org/&quot;&gt;CA/Browser Forum&lt;/a&gt;, which has
historically been quite conservative about adding
new algorithms. For instance, although much of the TLS
ecosystem has shifted to new modern elliptic curves
in the form of X25519, the BRs still do not support
those curves for digital signature. So, the first
thing that would have to happen is that CABF adds
support for some kind of PQ algorithm or a PQ hybrid
(more on this below). This probably won&#39;t happen
until there are commercial hardware security modules that can do
the PQ signatures.&lt;/p&gt;
&lt;p&gt;Once the new algorithms are standardized, then:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The CAs have to generate new keys that they will
use to sign end-entity certificates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Those keys (embedded in CA certificates) need to
be provided to vendors so they can distribute them
to their users.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Certificate transparency logs need to also get PQ
certificates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Servers need to generate their own PQ keys and
acquire new certificates signed
by the PQ keys at CAs and CT logs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that this transition is much worse than adding
a new signature algorithm would ordinarily be. For
instance, servers who wanted to use EC keys to
authenticate themselves didn&#39;t necessarily need to
wait for CAs to have EC keys themselves, because the
CA could sign a certificate for an EC key with an
RSA key, as RSA was still secure, just slower. This
meant you could have a gradual rollout, and things
got gradually better as you replaced the algorithms.
But
the whole premise of the PQ transition is that we
don&#39;t trust the classical algorithms, so eventually
you need to have the whole cert chain use the new
algorithms.
It&#39;s of course possible to have a mixed
chain, but that&#39;s more useful for experimenting
with deployment than providing actual security
against a CRQC.
In fact, as you gradually roll out, things get
slower, but you don&#39;t get the security benefit until
much later, which is actually the wrong set of incentives.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Once all this happens, when an updated client meets
an updated server, then the update server can provide
its new PQ-only or PQ-hybrid certificate. Just
as with key establishment, the client and server both
need to support the classical algorithms until effectively
every endpoint they might come into contact with has
PQ support. This isn&#39;t a big deal for the client,
but for the server it means that it needs to have both
a regular certificate and a PQ certificate for a very
long time.&lt;/p&gt;
&lt;p&gt;However, unlike with key establishment, during this
transition period neither client or server is getting any
security benefit from using PQ algorithms. This follows
from the fact that the security of the signature algorithm
in TLS is only relevant at connection establishment
time. There are two main possibilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Nobody with a CRQC is trying to attack your connections,
in which case the classical algorithm was just fine&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Somebody with a CRQC is trying to attack your connections,
in which case they will just attack the classical key
rather than the PQ key.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In order to get security benefit from PQ signatures in
this context, relying parties need to stop trusting
the classical algorithms, thus preventing attackers
from attacking those keys. In the Web context, this
means that Web browsers need to disable those algorithms;
until that happens PQ certificates don&#39;t make anything
more secure, but do make it more expensive, which is not
a very good selling proposition.&lt;/p&gt;
&lt;p&gt;For this reason, what I would expect to happen is
wide deployment of client side support for PQ signatures
but much less wide deployment of PQ certificates.
The vast majority of clients are produced by a small
number of vendors (the four major browser vendors)
and this is a fairly easy change for them to make.
By contrast, while servers are to some extent centralized
on big sites like Google or Facebook or big CDNs, there
are a lot of long tail servers who will not be motivated
to go to the trouble. In particular, I would be very
surprised if anywhere near enough servers adopted PQ-based
signatures to make it practical to disable classical
signatures absent from very strong pressure from the client
vendors.&lt;/p&gt;
&lt;p&gt;As a reference point, the first good attacks on SHA-1 were published in
2004, and SHA-1 wasn&#39;t deprecated in certificates until 2017. Moreover,
even after Chrome
&lt;a href=&quot;https://security.googleblog.com/2014/09/gradually-sunsetting-sha-1.html&quot;&gt;announced&lt;/a&gt; that they would deprecate SHA-1, it still took three years to
actually happen. The difference between SHA-1 and SHA-2 had
had no meaningful impact on performance or
on certificate size, so this was really just a matter of
transition friction. This isn&#39;t an atypical example: the vast
majority of certificates &lt;a href=&quot;https://ct.cloudflare.com/&quot;&gt;contain RSA keys and are signed with RSA keys&lt;/a&gt;
even though ECDSA is faster (for the server) and has smaller keys and signatures.&lt;/p&gt;
&lt;p&gt;There have been some recent changes
to the WebPKI ecosystem to make transitions easier (e.g., shortening
certificate lifetimes), but transitioning to PQ certificates
has much worse performance consequences, so we should definitely expect
the PQ transition to be a slow process.&lt;/p&gt;
&lt;h2 id=&quot;hybrids-vs.-pure-pq&quot;&gt;Hybrids vs. pure PQ &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#hybrids-vs.-pure-pq&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the big points of controversy is whether to mostly support
hybrid systems that combine both classical and PQ algorithms or
pure PQ algorithms. As noted above, the industry
seems to be trending towards hybrids for key establishment, but the
question of signatures is more uncertain.&lt;/p&gt;
&lt;p&gt;Looming over all of this is the fact that the US National Security
Agency and the UK GCHQ are strongly in favor of pure PQ algorithms
rather than hybrids. In November 2023, GCHQ put out a
&lt;a href=&quot;https://www.ncsc.gov.uk/whitepaper/next-steps-preparing-for-post-quantum-cryptography&quot;&gt;white paper&lt;/a&gt; arguing for pure PQ schemes rather than hybrid:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the future, if a CRQC exists, traditional PKC algorithms will
provide no additional protection against an attacker with a
CRQC. At this point, a PQ/T hybrid scheme will provide no more
security than a single post-quantum algorithm but with
significantly more complexity and overhead. If a PQ/T hybrid scheme
is chosen, the NCSC recommends it is used as an interim measure,
and it should be used within a flexible framework that enables a
straightforward migration to PQC-only in the future.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Similarly, the NSA&#39;s
&lt;a href=&quot;https://media.defense.gov/2022/Sep/07/2003071834/-1/-1/0/CSA_CNSA_2.0_ALGORITHMS_.PDF&quot;&gt;Commercial National Security Algorithms 2.0 (CNSA 2.0)&lt;/a&gt; guidance
contains some text that many read as saying they will eventually
not permit hybrid schemes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Even though hybrid solutions may be allowed or required due to
protocol standards, product availability, or interoperability
requirements, CNSA 2.0 algorithms will become mandatory to select at
the given date, and selecting CNSA 1.0 algorithms alone will no
longer be approved.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This isn&#39;t the clearest language in the world, but it seems
like the best reading is they don&#39;t want to allow hybrids.
On the other hand, at IETF 119 last week, NIST&#39;s Quynh Dang
&lt;a href=&quot;https://youtu.be/pTUvyVxPGYw?t=3931&quot;&gt;stated that NIST was fine with hybrids&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The specific timeline varies by product, but most relevant for
this post, they say they want to have Web browsers and servers
be CNSA 2.0 only by 2033:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/cnsa-timeline.png&quot; alt=&quot;CNSA 2.0 timeline&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
CNSA 2.0 timeline
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;It&#39;s a bit unclear what this means in practice for the Web, even if you
read it as &amp;quot;pure PQ only&amp;quot;. Recall that the way that TLS works is
that the client offers some algorithms and the server selects one;
this means that it should be possible for servers constrained by
CNSA 2.0 (&amp;quot;national security systems and related assets&amp;quot;) to
select pure PQ algorithms as long as enough browsers support them,
which seems somewhat likely, even though AFAICT no browser currently
supports them. However, it&#39;s much less viable for a browser to
only support PQ modes unless you never want to connect to
servers on the Internet which, as noted above, are not likely
to all support pure PQ. Are even government systems going to
be configured to disable hybrids in 2035?&lt;/p&gt;
&lt;p&gt;The CNSA 2.0 guidance is relevant for two reasons. First, there
are likely to be a number of applications which are going
to feel strong pressure to comply with CNSA 2.0. It&#39;s of course
possible that if vendors just decide to use hybrids, that NSA
ends up giving in and approving that, but people are understandably
reluctant to find out.
Second,
&lt;a href=&quot;https://www.ncsc.gov.uk/whitepaper/next-steps-preparing-for-post-quantum-cryptography&quot;&gt;GCHQ&lt;/a&gt;
and
&lt;a href=&quot;https://media.defense.gov/2022/Sep/07/2003071836/-1/-1/0/CSI_CNSA_2.0_FAQ_.PDF&quot;&gt;NSA&lt;/a&gt;
offer a number of arguments for why PQ algorithms as opposed to
hybrids. This post is already getting quite long, so I don&#39;t want to
go through them in too much detail, but they mostly come down to it&#39;s
more moving parts to have a hybrid (hence more complexity, cost, etc.)
and if there is a good CRQC, then the classical part of the system
isn&#39;t adding much if anything in the way of security.&lt;/p&gt;
&lt;p&gt;Another concern about hybrids is performance. Obviously,
hybrids are more expensive than pure PQ, but the difference isn&#39;t
likely to be a big factor. PQ keys and signatures are much bigger,
so the incremental size impact of having the classical algorithm
is trivial. ML-KEM is quite a bit faster than X25519, but X25519
is already so fast that my sense is that people aren&#39;t worried
about this. Similarly, ML-DSA is about twice as fast as EC for verification
but looks to about 4x slower for signing,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
a bit misleading because in most uses of TLS it&#39;s the server
that has to worry about performance and that&#39;s where the
signature happens, so again the incremental cost of EC isn&#39;t that big a deal.&lt;/p&gt;
&lt;p&gt;I&#39;m not sure how persuaded I am by these arguments, but I think
at best they are arguments at the margin. In particular, there&#39;s
no real reason to believe that deploying hybrids is inherently
unsafe, even if the classical algorithm is trivially broken.
Assuming that we&#39;ve designed things correctly, the resulting
system should just have the security of the PQ part of the hybrid.
I&#39;ve seen suggestions that severe enough implementation defects against
the classical part of the system (e.g., memory corruption) could
compromise the PQ part. This isn&#39;t out of the question, of course,
but modern software has a pretty big surface area of vulnerable
code, so it&#39;s hard to see this as dispositive.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;inside-baseball%3A-code-point-edition&quot;&gt;Inside Baseball: Code point edition &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#inside-baseball%3A-code-point-edition&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;For a long time, the IETF used to make it quite hard to get
code point assignments, for instance requiring that you
have an RFC. The idea was that we didn&#39;t want people using
stuff that hadn&#39;t been reviewed and that the IETF didn&#39;t
think was at least OKish. The inevitable result was that
a lot of time was spent reviewing documents (for instance
national cryptography standards) which the
IETF didn&#39;t care about but were just needed to get code
point assignments.
Worse yet, some people would just use as-yet unassigned code points—this
was easy because they&#39;re generally just integers—and
if there was any real level of deployment, that code point
became unusable whether it was officially registered or not.&lt;/p&gt;
&lt;p&gt;The more modern approach is to make code point assignment
super easy (effectively &amp;quot;write a document of some kind
which describes what it&#39;s for&amp;quot;) but to mark which code points are &amp;quot;Recommended&amp;quot;
by the IETF and which are not. The &amp;quot;Recommended=Y(es)&amp;quot; ones need
to go through the IETF process, but &amp;quot;Recommended=N(o)&amp;quot; code points
are free for the asking. This has significantly reduced the amount
of time that WGs spend reviewing documents for bespoke crypto
and has generally worked pretty well. More recently
the WG is &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-ietf-tls-rfc8447bis/&quot;&gt;adding&lt;/a&gt;
a &amp;quot;Recommended=D(iscouraged)&amp;quot; for algorithms which the
WG has looked at and thinks are bad.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;key-establishment-3&quot;&gt;Key Establishment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#key-establishment-3&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As noted above, most of the energy in key establishment is in hybrid
modes. They&#39;re easy to deploy now and seem safer than pure PQ
algorithms, at least for now. In TLS in particular, what seems
likely to happen is the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The TLS WG will &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-tls-hybrid-design-09.html&quot;&gt;standardize&lt;/a&gt;
a set of hybrid algorithms based on ML-KEM on and recommend
that people use them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The IETF will assign a code point (algorithm identifier) for
&lt;a href=&quot;https://datatracker.ietf.org/doc/draft-connolly-tls-mlkem-key-agreement/&quot;&gt;pure ML-KEM&lt;/a&gt;,
but it won&#39;t be a standard and the IETF won&#39;t recommend
(or disrecommend) its use.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The likely result is that there will be a lot of use of hybrids
but people will be able to use pure ML-KEM if they want it.
At some point, sentiment may shift towards
pure ML-KEM, in which case the TLS WG will be able to take
that document off the shelf and standardize it. However, as noted
above, that isn&#39;t urgent even if there is a working CRQC: people
can just burn a little more CPU and bandwidth and do hybrids
while the hybrid → pure PQ transition happens.&lt;/p&gt;
&lt;h3 id=&quot;signatures&quot;&gt;Signatures &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#signatures&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The question of whether to use hybrids versus pure PQ for signature is
still being hotly contested. As I mentioned above, it seems clear
that servers will need both classical and PQ signatures for some
time. The relevant question is exactly how they will be put
together.&lt;/p&gt;
&lt;p&gt;It seems likely that servers will have one certificate with a classical
algorithm (e.g., ECDSA) as they do today and then have another
certificate with a post-quantum algorithm. This could be in one
of two flavors:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;For a pure PQ algorithm (ML-DSA)&lt;/li&gt;
&lt;li&gt;For both a classical (e.g., ECDSA) and a PQ algorithm (ML-KEM).
As with key establishment, these would be packaged into a single
key and a single signature that was the combination of the two
algorithms, with the semantics being that both signatures have
to be valid.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For a while my
&lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/secdispatch/2Q6aYKi2u0ope-YHcRs684GBng8/&quot;&gt;intuition&lt;/a&gt; was
that it was easier to just do PQ: because the PQ algorithms were so
inefficient, clients and servers would largely favor the classical
algorithms unless it became clear that the classical algorithms were
insecure, and so it wouldn&#39;t matter much what was in the PQ
certificates. And if it became clear that the implementations had to
distrust the classical algorithms—which is going to be a super
rocky transition anyway given the likely level of deployment of PQ certificates—then the classical part of the
hybrid isn&#39;t doing much for you.&lt;/p&gt;
&lt;p&gt;Now, consider the opposite case where instead the PQ algorithm is
what&#39;s broken. At this point, you want to distrust that algorithm and
fall back to classical algorithms. By contrast, to distrusting the
classical algorithms, distrusting the PQ algorithms is comparatively
easy because everyone is going to still have classical certificates
for a long time, so relying parties (e.g., browsers) will probably be
able to just turn off the PQ algorithm, in which case you don&#39;t
really need a hybrid certificate for continuity.&lt;/p&gt;
&lt;p&gt;This is all true as far as it goes, but it&#39;s also kind of browser
vendor thinking because have really good support for remotely
configuring their clients, so it really is practical to turn off an
algorithm within days for most users. However, this isn&#39;t true
for all pieces of software, many of which take much longer to update,
and for those clients and servers the world will be much more secure
if the only two credentials they trust are classical (still OK)
and PQ hybrid (now just as secure as the classical credential).
Moreover, it&#39;s also possible that there will be a secret break
of the PQ algorithm, in which case even browsers won&#39;t update (the only
thing we can do for a secret CRQC is to stop trusting the classical
algorithms). For these
reasons, I&#39;ve come around to thinking that hybrids are the best
choice for PQ credentials in the short term.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-rollout/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Getting through this transition is going to put a lot of stress
on the agility mechanisms built into our cryptographic protocols.
In many ways, TLS is better positioned than many of the protocols in
common use, both because interactive protocols are inherently able to
negotiate algorithms and because TLS 1.3 was designed to make this
kind of transition practical. Even so, the transition is likely
to be very difficult. While TLS itself is designed to be
algorithm agile, it is often embedded in systems which themselves
are not set up to move quickly.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Many proprietary uses of TLS—such as applications talking back
to the vendor—should be able to switch pretty quickly and
seamless. For instance, Facebook can just update their app
in the app store and their server and they&#39;re done.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Web is going to be a lot harder because it&#39;s such a diverse
system and there isn&#39;t much in the way of central control on the
server side. On the other hand, the browsers are generally
centrally controlled by the vendors, which means that most
of the browser user base can change quickly. There is of
course a long tail of browsers in embedded devices (TVs, kindles,
etc.) which may be much harder to update.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Beyond these two cases, there is going to be a long tail of
TLS deployments which are in much worse shape and which can&#39;t
be easily remotely updated (e.g., many IoT devices). Depending
on how the clients or servers these devices need to talk to
behave, they may either be stuck in a vulnerable state
(if the peers don&#39;t enforce PQ algorithms) or just unable to
communicate entirely.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unfortunately, a rocky transition is actually the best case
scenario. The most likely outcome is that absent some strong evidence
of weakening of classical algorithms as a forcing function,
we have a long period of fairly wide deployment
of PQ or hybrid key establishment and very little deployment of PQ
signatures, especially if the PQ signature algorithms don&#39;t get any
better. Even worse would be if someone developed a CRQC in the next
few years—long before there is any real chance we will be ready
to just pull the plug on classical algorithms—and we have to
scramble to somehow replace everything on an emergency basis.
Fingers crossed.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Acknowledgement:&lt;/em&gt; Thanks to &lt;a href=&quot;https://twitter.com/rmhrisk/&quot;&gt;Ryan Hurst&lt;/a&gt;
for helpful comments on
this post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Lots of stuff in your computer (the transistors, LEDs, etc.)
are based on quantum effects, but fundamentally there&#39;s nothing
that your computer does that couldn&#39;t be done by clockwork.
This is something different. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The ML stands for &amp;quot;module-lattice&amp;quot;, which refers to the mathematical
problem that the algorithms are based on. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The situation here is a bit complicated.
NIST is standardizing
three schemes: ML-KEM, ML-DSA, and SLH-DSA. ML-KEM and
ML-DSA are based on &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Lattice-based_cryptography&amp;amp;oldid=1209231233&quot;&gt;lattices&lt;/a&gt;,
which have a fairly long history of use in cryptography.
SLH-DSA is based on &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Merkle_signature_scheme&amp;amp;oldid=1163482673&quot;&gt;hash signatures&lt;/a&gt; which are also quite old, but has unsuitable characteristics
for a protocol like TLS. Quite a few of the initial
inputs to the NIST PQ competition have subsequently been
broken (see this &lt;a href=&quot;https://cr.yp.to/papers/qrcsp-20231202.pdf&quot;&gt;summary&lt;/a&gt;
by Bernstein), including SIKE, which turns out to be
&lt;a href=&quot;https://eprint.iacr.org/2022/975.pdf&quot;&gt;totally insecure&lt;/a&gt;,
which is disappointing because it had some favorable
properties in terms of key size. There have also been
some improvements in attacking lattices in the past few
years, though they are not known to break either
ML-DSA or ML-KEM. In addition to algorithmic vulnerabilities,
some of the implementations of Kyber (the predecessor to
ML-KEM) had a timing side channel, dubbed &amp;quot;&lt;a href=&quot;https://research.kudelskisecurity.com/2024/02/01/the-kyberslash-vulnerability-and-the-crystals-go-library-a-retrospective-story/&quot;&gt;KyberSlash&lt;/a&gt;&amp;quot;.
All in all, you can see why people might want to engage
in some &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Defence_in_depth&amp;amp;oldid=1206121995&quot;&gt;defense in depth&lt;/a&gt;.
 &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;m simplifying here a bit, in that the client can actually
advertise curves it doesn&#39;t send key shares for, but we can
ignore that for the moment. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Simplifying again. Each side actually generates a secret
value and then computes their key share from that secret
value. The shared secret is computed from the local secret
and the remote key share. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Although of course users can reconfigure it, at least in
some systems. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See my post on how to successfully deploy
new protocols, coming soon. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The numbers I have here are from Westerban and are for
Ed25519, which isn&#39;t in wide use on the Web,
but, at least in OpenSSL, EdDSA and ECDSA seem to have
&lt;a href=&quot;https://asecuritysite.com/openssl/openssl3_b2&quot;&gt;similar performance&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is actually another option in which you have a single
certificate with the classical key in the normal place
(&lt;code&gt;subjectPublicKeyInfo&lt;/code&gt;) and the PQ key in an extension.
This certificate will be usable with both old and new clients,
with new clients signaling that they supported PQ and then
the server signing with both algorithms. This has the advantage
of only needing a single certificate but otherwise is kind of
a pain because it requires a lot more changes to TLS.
In the naive way I&#39;ve described it, it also involves sending
a lot more data for &lt;em&gt;every&lt;/em&gt; client, but there are ways around
that. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-rollout/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Sean O&#39;Brien 100K Race Report (2024)</title>
		<link href="https://educatedguesswork.org/posts/sob100k-2024/"/>
		<updated>2024-03-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/sob100k-2024/</id>
		<content type="html">&lt;p&gt;On Saturday 1/27 I ran the
&lt;a href=&quot;https://www.khraces.com/series/sean-o-brien-50-50&quot;&gt;Sean O&#39;Brien (SOB) 100K&lt;/a&gt;
in Southern California.
I ran this same race back in 2021 and got my 100K PR, so I knew the
course and felt like it was an opportunity to do better.
My training had been going well and I was dropping
PRs on my local courses, so I was looking forward to a strong
race and taking off bunch of time, with an overall target of 12:00 to 12:25,
so ~30-50 minutes off of 2021. This did not happen, though I did PR slightly.&lt;/p&gt;
&lt;p&gt;It actually turned out to be a bit of a mixed result. On one hand, I finished about 7
minutes faster than last time (more on the &amp;quot;about&amp;quot; later), and much
higher up in the standings (8th overall out of a starting field of
96) but all of the improvement was being more efficient at aid
stations and I actually was a little over 2 minutes slower in the
running part. My working theory is that it was warmer this year, and so
times were slower, but this is a bit harder to verify than one might
like.&lt;/p&gt;
&lt;p&gt;To orient yourself, here is the course and the hill profile. The
circles on the course are mile markers, so you start at the far
right, go all the way to the left, around the loop counter-clockwise,
then backtrack. There&#39;s an out-and-back down to Bulldog
and then you backtrack to the finish. The circles on the profile
are &amp;quot;climb score&amp;quot;, Runalyze&#39;s estimate of how hard the climb was.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sob-map.png&quot; alt=&quot;Map&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/sob100k-profile.png&quot; alt=&quot;Profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Screenshots from &lt;a href=&quot;https://runalyze.com/&quot;&gt;Runalyze&lt;/a&gt;, 2021 data]&lt;/p&gt;
&lt;h2 id=&quot;overall-logistics&quot;&gt;Overall Logistics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#overall-logistics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I&#39;ve been doing most of my training with Tailwind and Maurten drink mix,
but KH races uses Gu Roctane, which I don&#39;t particularly like—especially because
races have a tendency to offer the caffeinated version—so
I decided to use drop bags extensively. To make this easier to manage
I mapped out a regular eating schedule, that targeted 320-360 cal/hr, effectively:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1 500 ml bottle of Maurten 160 drink every hour, with 250ml
each 30 min&lt;/li&gt;
&lt;li&gt;Some mix of Maurten solid and Maurten gel aiming for ~100-200
cal/hr.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I use a 30 minute timer to manage all this, so I have to do &lt;em&gt;something&lt;/em&gt;
every 30 minutes. I started with Maurten solid and then moved onto a
mix of regular gel and the caffeinated gels. This got a little
complicated to manage due to Maurten&#39;s non-orthogonal lineup:
Maurten&#39;s solid bar is 225 cal, so effectively I was eating 1/3 bar
when the timer went off, which is fine. Ideally I would have just used
Maurten 160 gels every hour for 320 cal/hr, but I wanted to take caffeine every
2 hrs after 6 hrs and Maurten&#39;s caffeinated gel is only 100 cal, so I decided to
aspirationally add a Maurten 100 at the 30 minute mark, though
I wasn&#39;t sure I could reliably do 360 cal/hr. This mostly worked
out, especially once I got past the solid phase.&lt;/p&gt;
&lt;figure&gt;
&lt;div class=&quot;img-flex-equal&quot;&gt;
  &lt;div&gt;
    &lt;img src=&quot;https://educatedguesswork.org/img/sob2024-nutrition.jpeg&quot; /&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;https://educatedguesswork.org/img/sob2024-gear.jpeg&quot; /&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;figcaption&gt;
Everything laid out
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;To make all this easier I bagged up what I needed for each aid station
in a ziploc (with two bags for Kanan, because you hit it twice). The
way this works is you get to the AS, you (theoretically) dump out everything from
your pack, and then shove in whatever is in the ziplocs in.
I labeled
the ziploc both with where it was needed and my 2021 time for
the AS, so as soon as I picked up the bag I could see if I was
ahead or behind schedule. This part worked well and was a lot
easier than a pace sheet.&lt;/p&gt;
&lt;h2 id=&quot;start-to-corral-canyon-%5B7.3-mi%2C-%2B2270%2F-846-ft%2C-1%3A20%3A43%2C--2%3A25%5D&quot;&gt;Start to Corral Canyon [7.3 mi, +2270/-846 ft, 1:20:43, -2:25] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#start-to-corral-canyon-%5B7.3-mi%2C-%2B2270%2F-846-ft%2C-1%3A20%3A43%2C--2%3A25%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Race start was at 5:30 AM and sunrise at a bit after 7 so I expected
to run the first 60-90 minutes in the dark.
I got to the start with plenty of time and was able to drop off
my drop bags and then just chilled in the car for a while, before
heading over to the start line about 10 minutes early, planning
to use the bathroom.&lt;/p&gt;
&lt;p&gt;This is where things started to go wrong because there was a much
longer than expected bathroom line: apparently
the portapotties just never got delivered so we just had the park&#39;s
bathrooms, which really weren&#39;t enough for 100 runners. For some
reason, the RD decided not to delay the start even though a number of
people—including me—were still waiting. I decided that it
was better to use the bathroom than to be right at the start, and
ended up missing the start by about a minute (not the first time this
has happened to me, TBH).
I think this was the right decision overall: a minute isn&#39;t much for a
race this long, and I wasn&#39;t expecting to win, but the result is that
I started essentially at the back of the race. There are some early
sections of single track and so I spent quite a bit of time trying to
get past people who were going a lot more slowly than me. It&#39;s
important to conserve energy early, so I tried not to get too aggro,
but it still slows you down.&lt;/p&gt;
&lt;p&gt;New for this year there was a real water crossing 2 miles in
(I had heard rumors about this but no details because I missed
the briefing at the start), where you actually had to wade through
almost knee deep water with a rope for stabilization. I&#39;m never
a huge fan of this, but it was already fairly warm (never a good
sign) and my shoes dry quickly, so it wasn&#39;t uncomfortable.
Eventually I made it past most of the people slower than me and
then things opened up into fire roads so it wasn&#39;t a problem
to get past people any more. I felt like I was running pretty comfortably,
and, as with last time, opted to run as much as I could.&lt;/p&gt;
&lt;p&gt;I finally hit Corral Canyon at 1:21, about 3 minutes ahead of 2021
(all times here are from my watch, not gun time), which seemed pretty
good considering the start. I was trying to be conscious of aid
station time, and was in and out in 1:05. This is about the
best you can do if you&#39;re drinking regularly and using your own
nutrition because you have to pour the powder into the bottles and
then add water, but I see now it was 40s slower than last
year, so I think that&#39;s just the price you pay for bringing your
own nutrition.&lt;/p&gt;
&lt;h2 id=&quot;kanan-road-%5B6.3-mi%2C-%2B1010%2F-1444-ft%2C-1%3A06%3A55%2C--2%3A13%5D&quot;&gt;Kanan Road [6.3 mi, +1010/-1444 ft, 1:06:55, -2:13] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#kanan-road-%5B6.3-mi%2C-%2B1010%2F-1444-ft%2C-1%3A06%3A55%2C--2%3A13%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next section is mostly rolling single track and fire road.
I was feeling reasonably good on this section, but it was a bit
hard to get into my rhythm, as there were a lot of rocky
sections and stream crossings, and I actually tripped
a couple of times, which wasn&#39;t great. Fortunately, the dirt
was soft, so I didn&#39;t get hurt, but it&#39;s kind of discouraging.
Other than that, this section went reasonably fast.&lt;/p&gt;
&lt;p&gt;The first drop bag is at Kanan road, so I was able to grab
my nutrition refill and check my time (about 4 minutes ahead)
I lost some time here because I&#39;d taped up the bag too much and had trouble
untying it and then had to refill my nutrition but still got out reasonably quickly (3:57).
Only after I left did I realize I still had my headlamp in my pack, but
no way was I going back to drop it off. It&#39;s not that heavy, right?&lt;/p&gt;
&lt;h2 id=&quot;zuma-edison-ridge-1-%5B5.4-mi%2C-%2B1260%2F-997-ft%2C-1%3A00%3A12%2C--0%3A58%5D&quot;&gt;Zuma Edison Ridge 1 [5.4 mi, +1260/-997 ft, 1:00:12, -0:58] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#zuma-edison-ridge-1-%5B5.4-mi%2C-%2B1260%2F-997-ft%2C-1%3A00%3A12%2C--0%3A58%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next section is a rolling descent on single track followed by a
moderate climb on fire road to the top of the ridge line ad.  The fire
road was pretty smooth and as with last time, I felt good and
pretty much ran this whole section. There is a nice moderate
descent that was longer than I remembered and a bit rocky but
I felt really comfortable on. By this point in 2021 my knee
had already started to hurt, but everything was still good,
so that felt pretty promising. The next aid station (Bonsall) is all downhill
so I chugged some water, refilled my bottle, and just headed out. I forgot
to hit my lap timer on this one, but I know that the aid was pretty fast.&lt;/p&gt;
&lt;h2 id=&quot;bonsall-%5B3.4-mi%2C-%2B0%2F-1706-ft%2C-26%3A34%2C--2%3A49%5D&quot;&gt;Bonsall [3.4 mi, +0/-1706 ft, 26:34, -2:49] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#bonsall-%5B3.4-mi%2C-%2B0%2F-1706-ft%2C-26%3A34%2C--2%3A49%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted above, this next section is a 3.4 mile descent down to the
Bonsall aid station. Pretty much this whole thing is on fire road so I
was able to take it pretty fast (~7:46/mi, 50s/mile faster than
2021). With that said, I was apparently overcompensating for it
feeling short before, because I expected it to go really fast, and,
well, it kind of didn&#39;t; I kept thinking &amp;quot;OK, we must be at the bottom&amp;quot;,
but I wasn&#39;t. On the plus side, I was passing people, which doesn&#39;t
usually happen for me on the descent, so I was feeling like all that
training for downhill was paying off.&lt;/p&gt;
&lt;p&gt;I hit the aid station (second drop bag), swapped out my food, and filled
my bottles. It was only at this point that it started to sink in that I
had nearly 2 hrs of exposed mostly climbing, it was starting to get hot, and I
only had two bottles. I compensated by chugging some water and salt
caps and crossing my fingers. A while after I left the AS I realized I
was still carrying my headlamp, but once again, I wasn&#39;t going back.&lt;/p&gt;
&lt;h2 id=&quot;zuma-edison-ridge-2-%5B7.76%2C-%2B2910%2C-1184-ft%2C-1%3A56%3A17%2C-%2B4%3A43%5D&quot;&gt;Zuma Edison Ridge 2 [7.76, +2910,-1184 ft, 1:56:17, +4:43] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#zuma-edison-ridge-2-%5B7.76%2C-%2B2910%2C-1184-ft%2C-1%3A56%3A17%2C-%2B4%3A43%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There&#39;s a long climb out of Bonsall back to Zuma Edison Ridge. This is
actually two climbs, ~1600 ft, followed by a descent of around ~1000
ft and then another climb of ~1300 ft. As I rolled out of the aid
station, someone came by me with 3 bottles and one bouncing in his
pack and I started to think I had made a serious mistake in terms
of fluid but it was too late to fix it.&lt;/p&gt;
&lt;p&gt;This section is mostly hiking and there 3-4 people ahead of me,
including a guy named Colton who I&#39;d run part of the way with earlier
and I&#39;d been sort of going back and forth with (he eventually finished
one place behind me). I was able to mostly keep them in sight, but not
make much progress. This section is super exposed and I was really
starting to feel the heat and actually worried that I wouldn&#39;t
have enough. I didn&#39;t really think it would take me more than two
hours (two bottles by my drinking schedule) but in the heat I really
needed to be drinking more water than dictated by my calorie needs.
Worse yet, my knee started to hurt (same place as last time!) whenever
I ran, but as I wasn&#39;t doing much running, I just tried to ignore it.&lt;/p&gt;
&lt;p&gt;I would say this section was harder than 2021: I felt like it
was hotter and I felt like I was struggling more. Partway though
the second climb, the eventual first woman passed me and she
just looked a lot lighter on her feet, running parts that I only
barely had enough energy to hike. So, I was pretty glad to finally
get to the Zuma aid station, but this leg was about 5 minutes
slower than 2021. I burned through the aid station this time
and just kept going.&lt;/p&gt;
&lt;h2 id=&quot;kanan-road-%5B5.4-mi%2C-%2B1037%2F-1283-ft%2C-1%3A03%3A45%2C--2%3A40%5D&quot;&gt;Kanan Road [5.4 mi, +1037/-1283 ft, 1:03:45, -2:40] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#kanan-road-%5B5.4-mi%2C-%2B1037%2F-1283-ft%2C-1%3A03%3A45%2C--2%3A40%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point we&#39;re just backtracking down the backbone trail to a
previous aid station. This means a ~600ft climb followed by a step
descent and some rolling terrain. I started to feel somewhat better
here and was trying to focus on moving well on the downhill. At this
point, I passed Colton again, for the last time and just kept moving.
At this point I figured I was probably around top 15. I made it to
Kanan OK, grabbed my next nutrition refill, and &lt;em&gt;finally&lt;/em&gt;,
remembered to drop my headlamp into my drop bag.&lt;/p&gt;
&lt;p&gt;Whatever was wrong with my knee seemed to have fixed itself, so
I was less worried about not being able to finish, and
I had a pacer meeting me at Bulldog (mile 50), so my approach was
just to treat this like a 50 miler and figure the last 12 would
take care of themselves. This really meant one more modestly
hard segment back to Corral Canyon and then the long downhill
to Bulldog which was pretty runnable, so I was really just
counting down to Corral Canyon at this point.&lt;/p&gt;
&lt;h2 id=&quot;corral-canyon-%5B6.4-mi%2C-%2B1453%2F-974-ft%2C-1%3A30%3A55%2C-%2B1%3A41%5D&quot;&gt;Corral Canyon [6.4 mi, +1453/-974 ft, 1:30:55, +1:41] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#corral-canyon-%5B6.4-mi%2C-%2B1453%2F-974-ft%2C-1%3A30%3A55%2C-%2B1%3A41%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We&#39;re still retracing our steps back to the first aid station, so this
is mostly on single track and generally uphill. There was definitely
a fair amount of hiking here, but I was really trying to keep solid
running where I could. By this point in the race I was starting
to pass people doing the 50K (almost nobody seemed to be doing
the 50 mile), which is kind of nice, but I imagine pretty unpleasant
for them, given that I was running a lot faster after a lot further in.
This part didn&#39;t feel that bad, but nevertheless I was glad to
hit the aid station, and was looking forward to the long
downhill to Bulldog.&lt;/p&gt;
&lt;h2 id=&quot;bulldog-%5B5.9-mi%2C-%2B486%2F-1946-ft%2C-1%3A03%3A19%2C--0%3A07%5D&quot;&gt;Bulldog [5.9 mi, +486/-1946 ft, 1:03:19, -0:07] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#bulldog-%5B5.9-mi%2C-%2B486%2F-1946-ft%2C-1%3A03%3A19%2C--0%3A07%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This section is a long out and back, with the aid station being at the
bottom. Fortunately, this time I had a better picture of the course
and I was prepared for the mile long climb to the downhill, so it
wasn&#39;t as demoralizing that time. I was almost to the top of the climb
when someone came tearing the other way. I asked him if he knew what
place he was in and he said first, which was reassuring in terms of
where I was at in the standings but also meant I could just count
off people going the other way to see where I was.&lt;/p&gt;
&lt;p&gt;I tried to push this downhill a bit within the limits of not falling,
and felt more in control than last year, though actually the overall
pace for this leg was nearly identical to 2021. I was most of the way down
before I saw #2, who turned out to be &lt;a href=&quot;https://www.sharmanultra.com/coaches/iansharman&quot;&gt;Ian Sharman&lt;/a&gt;,
who has 9 Western States Top 10 finishes, so I felt like things were
going pretty well, even if he was probably having a bad day
(I eventually finished around 83 minutes behind him).&lt;/p&gt;
&lt;p&gt;Eventually I hit the bottom of the hill and it was onto the
flat/rolling section, which I&#39;d remembered as ~1 mile but is actually
more like 2 miles. About a mile from the turnaround there is a
concrete bridge/overpass over a small river, which you have to get
over somehow. It&#39;s maybe 3 ft above the trail and someone had put a
small stepladder so you could get onto it, but even so it was a bit of
a struggle, which wasn&#39;t a really good sign in terms of my legs being
fresh. By the time I had made it to the aid station, I counted off 6 men
and 1 woman before me, which seemed pretty good. I grabbed my
last nutrition bag, my headlamp, and headed back out.&lt;/p&gt;
&lt;h2 id=&quot;corral-canyon-%5B5.8%2C-%2B1906%2F-495-ft%2C-1%3A32%3A06%2C-%2B6%3A42%5D&quot;&gt;Corral Canyon [5.8, +1906/-495 ft, 1:32:06, +6:42] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#corral-canyon-%5B5.8%2C-%2B1906%2F-495-ft%2C-1%3A32%3A06%2C-%2B6%3A42%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;My pacer Kate and I ran the flat mile or two modestly hard—the
bridge was even worse on the way back because I sort of had to scoot
down the two whole feet onto the ladder—and then just settled in for the long hike
up to the top. I was trying to push this pretty hard but definitely
wasn&#39;t feeling amazing. Still, it was pretty nice to see everyone
behind me going the other direction.&lt;/p&gt;
&lt;p&gt;I&#39;d hoped to make up time on this segment, but actually I was almost 7
minutes down for this leg (still about 8 minutes ahead overall) by the
time I hit the aid station. I actually thought I was more like 14 minutes
ahead because I misremembered my target time (note to self: also do a pace
sheet). It didn&#39;t really matter, though, because my plan was just to
push the pace as much as I could on the way down.&lt;/p&gt;
&lt;h2 id=&quot;finish-%5B7.3-mi%2C-%2B833%2F-2277-ft%2C-1%3A27%3A27%2C-%2B0%3A30%5D&quot;&gt;Finish [7.3 mi, +833/-2277 ft, 1:27:27, +0:30] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#finish-%5B7.3-mi%2C-%2B833%2F-2277-ft%2C-1%3A27%3A27%2C-%2B0%3A30%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The way to the finish is some rolling single track followed by a
really long descent, first on fire roads (remember, we&#39;re
backtracking again, though I&#39;d done this section entirely in the dark
on the way out) and then on single track. At this point, I was hiking
most of the climbs but trying to run the downhill as much as I could.&lt;/p&gt;
&lt;p&gt;Unfortunately, due to the shorter day and the later start, I had
to run a lot of this in the dark, unlike 2021, when I finished
in the light. I did have a headlamp (Petzl &lt;a href=&quot;https://www.petzl.com/US/en/Sport/ACTIVE-headlamps/ACTIK-CORE&quot;&gt;Actik
Core&lt;/a&gt;),
but I was really wishing I had something brighter, especially when
we got the single track. If I&#39;d just carried my Lupine another 10 miles
or so, I could have had it with me for this, which might have
made a difference, as I wasn&#39;t able to go as fast as my legs
would have supported because I couldn&#39;t see very well&lt;/p&gt;
&lt;p&gt;After a long downhill there is a mile or so of uphill, which I knew
about this time (pretty much right after the water crossing) and was
actually looking forward to, both as a break from having to pick
my way through things and an opportunity to push the pace some.
I did that and was rewarded by getting to listen to Kate breathing a bit harder
behind me. This felt a little longer than I expected, but I&#39;d been
doing plenty of climbing in training so I was comfortable with it.&lt;/p&gt;
&lt;p&gt;After the peak of the hill, it&#39;s back to the single track descent
followed by about a half mile of nice flat fire road, which gave
me an opportunity to open up a little bit towards the finish.
We were still passing people but they were not in the 100K so it doesn&#39;t
really count.&lt;/p&gt;
&lt;h2 id=&quot;analysis&quot;&gt;Analysis &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#analysis&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I mentioned at the top, it&#39;s hard to compare year to year, so this
section is mostly me thrashing around trying to get a better sense of
it. The chart below shows my performance against 2021 (watch time, not gun
time):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Leg&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Distance&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Vert&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Time&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;vs 2021&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;vs 2021 (cum)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Corral&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7.29 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2,270/-846 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1:20:43&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-2:25&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-2:25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:05&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+41&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-1:44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Kanan&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;6.34 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,010/-1,444 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1:06:55&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-2:13&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-3:57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2:37&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-1:20&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-5:17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Zuma&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.42 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,260/-997 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:00:12&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-58&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-6:15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Bonsall&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3.43 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+0/-1,706 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;26:34&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-2:49&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-9:04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+2:56&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-1:30&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-10:34&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Zuma&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7.76 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+2,910/-1,184 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:56:17&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+4:43&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-5:51&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2:05&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-3:22&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-9:13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Kanan&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.40 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,037/-1,283 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:03:45&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-2:40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-11:53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3:26&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+23&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-11:30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Corral&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;6.37 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,453/-974 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:30:55&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1:41&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-9:49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;?&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-2:13&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-12:02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Bulldog&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.91 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+486/-1,946 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:03:19&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-7&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-12:09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3:22&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-29&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-12:38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Corral&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.84 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,906/-495 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:32:06&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+6:42&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-5:56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:51&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-2:11&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-8:07&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Finish&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7.32 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+833/-2,277 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:27:27&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+30&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-7:37&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As seems pretty clear here, I was just faster through Bonsall
both on the running legs and in the aid stations, and then
I lost a lot of time on the climb out of Bonsall and then again
on the climb out of Bulldog, but was still about the same
as 2021 on the rest of the legs and was better on the aid
throughout.&lt;/p&gt;
&lt;p&gt;The graph below compares my paces on each grade from 2021 to
this year with one graph for each hour.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/speed-vs-grade-sob.png&quot; alt=&quot;Speed versus pace&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Speed versus grade, faceted by hour.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For the first 5 hours, I was just plain faster both on the climbs and
the descents. In hours 5 and 6 (the climb out of Bonsall) I started to
slow down, especially on the climbs. I recovered again on 7 and 8 when
it was just straight running, and then struggled again on the climb
out of Bulldog but was pretty solid towards the finish.&lt;/p&gt;
&lt;p&gt;It&#39;s a bit hard to know exactly what to make of this, but my
working theory was that it was hotter this year and so when I had
to exert a lot of effort on the climbs, I slowed down but when
I was able to just run comfortably, I was still faster because
heat wasn&#39;t as much of a factor. It&#39;s of course possible I have
gotten worse at climbing or I wasn&#39;t pushing as hard, but I don&#39;t
think that&#39;s true. I was definitely pushing pretty hard on the
climb out of Bonsall and I felt like I was pushing on the climb
out of Bulldog and that was Kate&#39;s impression as well. I&#39;ve generally
been hiking pretty well this season, and as noted above, I was
doing well on the climbs early in the race, so I don&#39;t think I&#39;ve
just suddenly gotten a lot worse in this area.&lt;/p&gt;
&lt;p&gt;Beyond my own performance, there are some other reasons that suggest
that this year was harder and that it was at least in part due to heat:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Runalyze&#39;s estimate of the weather is 72&lt;sup&gt;o&lt;/sup&gt; this year versus 63&lt;sup&gt;o&lt;/sup&gt; for 2021 (though
more humid in 2021) and Garmin&#39;s somewhat confusing sensor (which seems to integrate skin and
air) also shows things 5-10&lt;sup&gt;o&lt;/sup&gt; hotter in 2024.&lt;/li&gt;
&lt;li&gt;The drop rate in 2021 was 3/33 (9%), whereas this year it was 23/96 (23.9%)&lt;/li&gt;
&lt;li&gt;In 2021 there were 5 people under 12:00 and this year there were 4 even
with a much larger field.&lt;/li&gt;
&lt;li&gt;While Kate was waiting at the aid station, she kept hearing how people were
underperforming because it was hot.&lt;/li&gt;
&lt;li&gt;While the winner&#39;s time was the same, the median times were a lot worse (~28 minutes overall,
67 minutes including DNFs), as shown below:&lt;/li&gt;
&lt;/ul&gt;
&lt;figure&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Year&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Notes&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Mean Time&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Median&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Median excl DNFs&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;DNF rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2024&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Same as 2021, 2020 course&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:45:48&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;15:29:40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;15:08:40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;23/96 (24%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2023&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Short course (~2-3 miles)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:44:00&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:04:51&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:42:59&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4/94 (4%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2022&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Short course (~3-4 miles): reroute due to rockslide&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:37:04&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:01:46&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:45:56&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7/69 (10%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2021&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;In October instead of January&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:04:24&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;15:01:14&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:21:13&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3/33 (9%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2020&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:41:24&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:11:57&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:43:33&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;24/154 (16%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2018&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:43:56&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:09:24&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:09:24&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;131 finishers, no DNFs listed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2017&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Short course due to weather (~2 miles)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:44:49&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:46:20&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:46:20&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;137 finishers, no DNFs listed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption&gt;
Figure thanks to Kate Hudson
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;It may also be the case that I and others aren&#39;t as heat adapted because
the race was in the winter rather than the fall.&lt;/p&gt;
&lt;p&gt;I do think I faded a bit in the last 13 miles or so. I don&#39;t have splits, but
I estimated that the female winner was maybe 1-1.5 miles ahead of me at
Bulldog and she finished 45 minutes ahead, so she must have put at least
20 minutes on me from there. That&#39;s consistent with how fresh she looked
when I saw her earlier: I definitely think those miles would have been
a lot faster if I had been fresh and running more than hiking (they would
also have been faster in the light!).&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Times, aside I felt like I followed the game plan pretty well. I ran
when I could and hiked when I felt like I had to. I think there were
maybe a few places towards the end that I could have run if I had to,
specifically the up part of the rollers at the beginning of Bulldog
and after Corral Canyon, but I felt like I was hiking pretty fast,
so I&#39;m not sure I would have run it much faster; I think I was in part
just limited by what I had left in the tank.
I&#39;m quite pleased that I was legit faster on the downhills most of the
race. This is something I was working on and so it&#39;s nice to see that
pay off. I&#39;m not sure why I kept tripping, but I guess I still have more
agility work to do.&lt;/p&gt;
&lt;p&gt;Missing the start really sucked because of having to work my
way through everyone. I think this was the right decision,
as I definitely had to go and made it through the race without
issue but I wish I&#39;d made it to the toilets earlier, so I could
have started with everyone else. I might have pushed a bit too hard
at the start, but I think I did a reasonable job of holding back.&lt;/p&gt;
&lt;p&gt;My nutrition strategy worked well. It was pretty easy to stick to
an every 30 minutes schedule and I didn&#39;t have any major GI issues:
I felt fine until after Corral Canyon and then just a little nauseated
afterwards, and even then I was still able to eat, just not as many
calories per hour as I wanted (mostly I ditched the extra Maurten
100 in the hour when I had caffeine.). Having a caffeinated gel on
the half hour was easy to manage. The two things I might change here are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;I want to try to just do 360 cal/hr, so I could do a Maurten every 30 minutes&lt;/li&gt;
&lt;li&gt;I should have brought an extra bottle for the Bonsall climb and just
had electrolyte or swapped out another Maurten 160 bottle for a gel,
because I think I did get dehydrated there.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As noted above, I wish I&#39;d had a better light for the finish. I think
I got optimistic because I finished in the light in 2021 and didn&#39;t
properly account for the later start and earlier sunset.&lt;/p&gt;
&lt;h2 id=&quot;overall&quot;&gt;Overall &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k-2024/#overall&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;12:46:25 (gun time), 12:45:37 (hand time). 8th/73 overall, 7th/59 (male), 1st 50-59&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>A hard look at Certificate Transparency: CT in Reality</title>
		<link href="https://educatedguesswork.org/posts/transparency-part-2/"/>
		<updated>2023-12-25T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/transparency-part-2/</id>
		<content type="html">&lt;p&gt;This is part II in my series about Certificate Transparency (CT) and
transparency systems. In &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1&quot;&gt;part I&lt;/a&gt;,
we looked at how to build a simple transparency system
that guaranteed that each certificate was published and
that each participant in the system has the same view of the
list of certificates. This prevents covert misissuance of
certificates and makes it possible—at least in principle—to detect
when misissuance has occurred. In this post, I want
to look at CT as it is actually deployed on the Internet.&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/moonlaser-phones.jpg&quot; alt=&quot;A laser writing on the face of the moon&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Writing on the face of the moon, but nobody&#39;s looking. Image by Kate Hudson with components from Midjourney and Adobe AI.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;[Update: 2023-12-25.  After I posted this, I had a long
&lt;a href=&quot;https://twitter.com/estark37/status/1739395235837510035&quot;&gt;discussion&lt;/a&gt;
with Chrome&#39;s Emily Stark and Ryan Hurst (formerly Google Core
Security and Google Cloud) on X/Twitter.  I&#39;ve made some revisions below in light of that
discussion. Big thanks to Emily and Ryan for the critique
and detailed discussion.]&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;deployment-compromises&quot;&gt;Deployment Compromises &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#deployment-compromises&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In the previous post, we designed a greenfield system without
worrying too much about deployment. Unfortunately for CT,
the WebPKI was already well established—with all
its faults—by the time CT was developed.
You run into a number of challenges
when you go to retrofit it to the existing WebPKI, starting with
the fact that it was a lot of work for CAs and didn&#39;t bring them
any value. Importantly, deploying CT doesn&#39;t make a CA&#39;s customers
any more secure because the attacker can just try to get a certificate
for those customers from another CA. What it mostly does it make
it harder for your CA to misbehave, but that&#39;s not really a
selling point, and after all, mistakes are something that happen
to other people!&lt;/p&gt;
&lt;p&gt;Google&#39;s plan for overcoming these deployment hurdles came in two parts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;(Eventually) Require CAs to use CT in order to be trusted by
Chrome, thus forcing universal deployment of CT.&lt;/li&gt;
&lt;li&gt;Make a bunch of technical compromises designed to make CT easier
for CAs to deploy.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Obviously, part (1) of this plan kind of involved playing &lt;a href=&quot;https://en.wikipedia.org/wiki/Chicken_(game)&quot;&gt;chicken&lt;/a&gt;
with the CAs. Chrome is by far the most popular browser, but it
wouldn&#39;t be for long if it didn&#39;t work with a lot of Web sites. In order
to make requiring CT a credible threat, Google
needed to get enough CAs onboard that the number of sites with
certificates not published in CT was very small, thus making
it possible to break them with making Chrome useless,
hence the need for the technical compromises
to make it more palatable. The remainder of this section talks
about some of those compromises.&lt;/p&gt;
&lt;h3 id=&quot;transparency-logs&quot;&gt;Transparency Logs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#transparency-logs&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Previously I talked about the CA publishing the Merkle tree of
certificates, but there&#39;s no technical reason the CAs have to do it
themselves; the
certificates just have to be published &lt;em&gt;somewhere&lt;/em&gt;. CT separates the job of running
the CA from the job of publishing the certificates by creating the
role of a transparency &lt;em&gt;log&lt;/em&gt;, which is responsible for building the
tree. The CAs don&#39;t have to operate a log (though some do) just
register their certificates with the log.&lt;/p&gt;
&lt;p&gt;This design has several advantages. First, it makes life easier
for the CAs, who don&#39;t have to run logs. This may not seem like
a big deal, but it turns out that running a log is a lot of work
for reasons we&#39;ll get into below, and indeed very few CAs actually
run their own logs today. Instead, some entity with
a lot of operational resources and experience (i.e., Google), could
run a log that supports multiple CAs, hopefully making it easier
for the CAs to deploy.&lt;/p&gt;
&lt;p&gt;Second, having a relatively small number of logs improves the
scaling properties of the system somewhat: much of the overhead
for the clients comes in the form of getting an authentic copy
of the signed root (what CT calls a &lt;em&gt;signed tree head (STH)&lt;/em&gt;),
and if each CA has its own tree, that means
one root for each CA. If there&#39;s just a small number of logs
then you need a correspondingly smaller number of roots. Similarly,
in order to ensure that no certificates have been misissued,
sites need to have a copy of the database for every CA; it&#39;s
easier if those databases are all aggregated into a small number
of logs than to have to retrieve them independently.&lt;/p&gt;
&lt;p&gt;Finally, the log design makes it possible to publish certificates
even for CAs which don&#39;t participate because the log can just
unilaterally ingest those certificates. Consider what happens if
most CAs publish their certificates in CT but some don&#39;t, but Chrome
wants to require CT. They could use the Google crawler to collect
certificates for non-cooperating CAs and put them in the log,
thus potentially making it easier to require CT. This doesn&#39;t help
as much as you&#39;d think because you still have the problem of how
the client gets the inclusion proof for the certificate, but
there are some (not great) options here.&lt;/p&gt;
&lt;h3 id=&quot;signed-certificate-timestamps&quot;&gt;Signed Certificate Timestamps &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#signed-certificate-timestamps&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The big problem with the design as I described it in part I is that it
inserts a delay in the certificate issuance process:
if you are going to provide the inclusion proof at the time
of certificate issuance, then you need to collect all the
certificates that go into the Merkle tree &lt;em&gt;before&lt;/em&gt; you can
issue the certificates to the site. If you publish one
signed tree a day, this means that on average it will take
12 hrs between the certificate request and issuance, which
also means that it takes on average 12 hours and up to a day
at the worst case to bring a site online. This might have
been acceptable if we were starting from scratch, but
certificate issuance times are measured in &lt;strike&gt;minutes&lt;/strike&gt;seconds &lt;em&gt;[Updated 2023-12-25. Per Ryan Hurst]&lt;/em&gt;. and so
this would have represented an unacceptable regression,
especially for sites which didn&#39;t have a valid certificate
and so would have to wait up to 24 hours to deploy
(not such a big deal the first time, but an absolute
emergency if you had a live site and you let your certificate
expire).&lt;/p&gt;
&lt;p&gt;In order to address this issue, Google introduced a new
concept, the &lt;em&gt;signed certificate timestamp (SCT)&lt;/em&gt;. An SCT
is a signed &lt;em&gt;promise&lt;/em&gt; that the log will add the certificate to their
tree soon, even though they haven&#39;t yet.
The figure below shows the issuance flow with SCTs.&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ct-issuance.png&quot; alt=&quot;Certificate transparency issuance with SCTs&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Certificate issuance with SCTs
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The way this works is that the CA produces what&#39;s called a &amp;quot;pre-certificate&amp;quot;,
which is a data structure that has all the information that would be
in a real certificate. It then sends that to the log, which returns an
SCT that covers the pre-certificate. The CA then takes the SCT
and adds it to the certificate before issuing it to the site.
This has the big advantage that the site doesn&#39;t
need to know about CT; because the SCT is part of the certificate,
it can use the certificate as before without changing anything, which
is obviously a big deal for incremental deployment. In fact, the
CA can deploy CT entirely on its own one day and sites will just
automatically have CT-enabled certificates.&lt;/p&gt;
&lt;p&gt;Because SCTs can be generated immediately by the log, CAs can deploy CT without
significantly slowing down their issuance process; they just retrieve
the SCT and it&#39;s the log&#39;s responsibility to eventually publish the
pre-certificate in its own Merkle tree (&amp;quot;eventually&amp;quot; is doing a lot
of work here, as we&#39;ll see below). The resulting certificate is immediately
usable because the client checks for the SCT rather than checking
the Merkle tree.&lt;/p&gt;
&lt;h2 id=&quot;trust-is-a-bad-word&quot;&gt;Trust is a bad word &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#trust-is-a-bad-word&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The good news is that CT with SCTs is minimally disruptive while
also allowing the browser to enforce the use of CT. The bad
news is that it has totally different and much weaker
security properties from
the system we started with. The problem is that the SCT is just
a promise that the log will incorporate the certificate into
their Merkle tree, rather than a proof that it actually did,
so you&#39;re reduced to trusting the log not to lie.&lt;/p&gt;
&lt;p&gt;Recall the security logic of a transparency system, as described
in &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-ideal&quot;&gt;Part I&lt;/a&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The CA publishes every certificate (i.e.,
identity/public key pair) that it issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The owner of a given identity—and potentially other
people—ensures that it recognizes every certificate that was
published.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Relying parties check that a certificate is in the log before
accepting it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The use of SCTs breaks part (3) of this
system, because the client is just checking that the log &lt;em&gt;promised&lt;/em&gt;
to incorporate the certificate, rather than that it actually did.
Consider what happens if you have a malicious CA that colludes
with a malicious log. The CA would misissue a certificate for
&lt;code&gt;example.com&lt;/code&gt;, along with an SCT from the malicious log,
but the log would omit the certificate
from its published tree. The client will accept the certificate because
it has the SCT, but because the log never publishes the certificate,
&lt;code&gt;example.com&lt;/code&gt; has no opportunity to detect the misissuance.&lt;/p&gt;
&lt;p&gt;What&#39;s happened here is that we&#39;ve taken a system which was publicly
verifiable and turned it into a system in which we have to trust
the logs not to cheat by issuing SCTs for certificates they don&#39;t
actually publish, &lt;em&gt;potentially with some double checking, as described
&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#chrome-ct-auditing&quot;&gt;below&lt;/a&gt; [Updated 2023-12-25]&lt;/em&gt;.
This is still better than where we started because
a successful attack requires that both the log and the CA be malicious, but it&#39;s
a much weaker set of properties from not having to trust the
log at all.&lt;/p&gt;
&lt;p&gt;This design also means that not anyone can run a log but instead
logs have to be vetted to be trustworthy and to conform
to browser &lt;a href=&quot;https://googlechrome.github.io/CertificateTransparency/ct_policy.html&quot;&gt;policy&lt;/a&gt;.
This trust decision has to be encoded into the browser which decides whether to
accept a given SCT. At present,
Chrome &lt;a href=&quot;https://www.gstatic.com/ct/log_list/v3/log_list.json&quot;&gt;accepts logs&lt;/a&gt;
from only six operators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Google itself&lt;/li&gt;
&lt;li&gt;Cloudflare&lt;/li&gt;
&lt;li&gt;DigiCert&lt;/li&gt;
&lt;li&gt;Sectigo&lt;/li&gt;
&lt;li&gt;Let&#39;s Encrypt&lt;/li&gt;
&lt;li&gt;TrustAsia&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When Google originally launched the CT requirement in Chrome, they actually
required that at least one of the logs be Google&#39;s log, which meant that
the policy effectively came down to &amp;quot;we (Chrome) trust Google&#39;s log not to
lie&amp;quot;, but had some obvious problems from an openness perspective, as
it meant that realistically CAs had to use Google&#39;s log. They have since
&lt;a href=&quot;https://groups.google.com/a/chromium.org/g/ct-policy/c/507lPdbbwSk&quot;&gt;changed the policy&lt;/a&gt;
and now you can use any two accepted logs (for certificates
valid for 180 days or less) or three logs (for certificates valid for more
than 180 days). This means that in order to covertly misissue you need
a malicious CA and two malicious logs to collude.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update: 2023-12-25:&lt;/em&gt; Ryan Hurst &lt;a href=&quot;https://twitter.com/rmhrisk/status/1739380947307651386&quot;&gt;points out&lt;/a&gt;
argues that the requirement for policy compliance is more about ecosystem health
than about the need to trust the logs (assuming I understand him correctly)
and that Chrome&#39;s &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#chrome-ct-auditing&quot;&gt;auditing&lt;/a&gt; allowed them to verify
inclusion, and thus to relax their log policy. As noted below, I think
this has some force for Chrome, but mainly because it&#39;s effectively
making Google the guarantor that a certificate has actually been published.&lt;/p&gt;
&lt;h3 id=&quot;closing-the-loop&quot;&gt;Closing the Loop &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#closing-the-loop&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because the source of the problem is that the client isn&#39;t verifying inclusion
of the certificate (by checking the inclusion proof)
but only that the log says it would include it (by checking the SCT),
the obvious fix is to have the client somehow verify that the certificate
actually was included. This turns out to be somewhat challenging
and there have been a number of attempts, none of which really work.&lt;/p&gt;
&lt;p&gt;The first problem is that we will not always be able to enforce inclusion
in real time for the same reason that we need SCTs in the first place:
the certificate might have just been issued very recently. For these
certificates the client has to trust the SCT to establish the
connection and at best can check that the certificate was subsequently
included by the logs. This is actually worse than it sounds because
the CA has complete freedom about what timestamp to put in its certificates,
and so—assuming it can collude with two logs—it can always
have a misissued certificate appear to be recent. The result is that the
attacker will succeed in impersonating the server and at best the
client will be able to detect the cheating at some later time when it
determines that the certificate was never logged.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;verifying-inclusion&quot;&gt;Verifying Inclusion &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#verifying-inclusion&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Even once you are past the time when the certificate should have been
logged, verifying that it actually was is tricky. For obvious performance
reasons we don&#39;t want to have to download the entire database.
The inclusion proof is nicely compact, but when the client contacts
the log and asks for the inclusion proof, that tells the log which
certificate the client is checking and hence which site the client
is visiting; together with the client&#39;s IP address, this allows the
log to track the client&#39;s activity. Obviously, this problem is worse
if there are only a small number of logs and was even worse when
Google had to be one of them.&lt;/p&gt;
&lt;p&gt;In order to prevent this form of tracking, we need some way for the client
to retrieve the inclusion proof anonymously. There are a number of
possible options here (&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/&quot;&gt;VPNs or proxies&lt;/a&gt;)
or &lt;a href=&quot;https://educatedguesswork.org/posts/pir&quot;&gt;Private Information Retrieval&lt;/a&gt;. As far as I know,
no log deploys any kind of PIR—it would probably be quite
expensive—and while proxies or VPNs are technically feasible,
they&#39;re not free to run. There are similar problems with clients
reporting certificates which are not included but should have been.
I&#39;m not aware of any major browser which verifies
certificate inclusion &lt;em&gt;proofs [Update 2023-12-25]&lt;/em&gt; by default (Chrome had some ideas about using
DNS,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
but seems to have &lt;a href=&quot;https://bugs.chromium.org/p/chromium/issues/detail?id=506227#c59&quot;&gt;abandoned them&lt;/a&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;),
though see &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#chrome-ct-auditing&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;distributing-inclusion-proofs&quot;&gt;Distributing Inclusion Proofs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#distributing-inclusion-proofs&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One way to minimize the privacy risk of retrieving the inclusion proofs
is to have the server distribute them to the client. Of course, if you&#39;re not willing
to wait for the next STH, then you still have to deal with SCTs, but
at least after the STH was issued the server could somehow get a copy
of the inclusion proof and send that to the client, thus preventing
the client from having to retrieve the inclusion proof for older
certificates. This seems like a good idea in practice but ran into
several problems.&lt;/p&gt;
&lt;p&gt;First, it was never really clear how you would distribute the STH
to the server, which, after all, already has the certificate. One
possibility is to incorporate the STH into a new certificate, which
the server would then retrieve a day or two later and thereafter
server to the client; this seemed
kind of impractical when CT was originally designed, but in the
intervening 10 years, automatic certificate issuance has become
far more common (specifically, a protocol called &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8555&quot;&gt;ACME&lt;/a&gt;,
originally developed for Let&#39;s Encrypt), and so it wouldn&#39;t
be that hard to imagine modifying ACME to send an updated
certificate. Importantly, this is something that could be
deployed incrementally, because clients have to be able to
fall back to SCTs anyway. However, it doesn&#39;t seem to be something
that&#39;s happening.&lt;/p&gt;
&lt;p&gt;There were also ideas about using what&#39;s called OCSP stapling.
Because certificates have a long lifespan, they might be revoked while
still otherwise valid. The
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8555&quot;&gt;OCSP&lt;/a&gt; protocol allows
clients to check whether a certificate is still valid, but introduces
latency and has its own privacy problems. For a while, there was
interest in having servers pre-retrieve OCSP responses (they&#39;re
signed by the CA) and give them to clients proactively, thus
letting them skip the OCSP checks, and it would be straightforward
for the CA to put the inclusion proof in the OCSP response.
This has similar deployment properties to the new certificate
idea, except that it requires servers to actually do OCSP
stapling. However, at the end of the day browsers adopted
a different set of mechanisms for handling revocation, centered
around centrally distributed revocation lists, so OCSP
stapling never really took off.&lt;/p&gt;
&lt;p&gt;All of these ideas about providing inclusion proofs to the
client were made more complicated by ambiguity about which
STH the inclusion proof was supposed to apply to. In the system
I described in part I, there was a new Merkle tree every day,
but the way CT is actually designed is that there is an ever-growing
Merkle tree and STHs are issued at whatever intervals are
convenient for the log, as long as they aren&#39;t too far
apart. This means that it&#39;s possible for the browser to have
an STH for 5 PM but the server to have an inclusion proof for 4 PM.
CT has a way of handling this with a mechanism called a &amp;quot;consistency
proof&amp;quot; that bridges between these two versions of the tree, but
retrieving the consistency proof requires contacting the log,
which creates new privacy problems.&lt;/p&gt;
&lt;p&gt;This is actually a solvable problem if the logs provide a more
predictable mapping from certificates to STHs (a technique
called &lt;em&gt;STH discipline&lt;/em&gt; which Richard Barnes and I worked on),
but by the time this was all worked out, there wasn&#39;t that much
energy for changes to CT.&lt;/p&gt;
&lt;h3 id=&quot;gossip-doesn&#39;t-work&quot;&gt;Gossip Doesn&#39;t Work &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#gossip-doesn&#39;t-work&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Even if we did have some mechanism for verifying the inclusion
proof, we still have the problem of getting consensus on the STHs. The original
CT design assumed a flood fill technique (what they called
&amp;quot;gossip&amp;quot;) like I described in part I,
but was frustratingly short on specifics:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All clients should gossip with each other, exchanging STHs at least;
this is all that is required to ensure that they all have a
consistent view.  The exact mechanism for gossip will be described in
a separate document, but it is expected there will be a variety.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Needless to say, this is some vigorous handwaving, and actually
building a system like this is fairly hard. In particular, there&#39;s
no obvious way for browser clients to discover and communicate with each
other (see my post on &lt;a href=&quot;https://educatedguesswork.org/post/nat-part-3.md&quot;&gt;ICE&lt;/a&gt; to see some of
the challenges here), as this isn&#39;t something they otherwise normally do.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Eventually the IETF did try to produce a
&lt;a href=&quot;https://datatracker.ietf.org/doc/draft-ietf-trans-gossip/&quot;&gt;document&lt;/a&gt; with some ideas, but it was quite complicated and the IETF abandoned it and
as far as I know, no browser ever implemented gossip.&lt;/p&gt;
&lt;p&gt;Another option to gossip is to have the software vendor just
provide the STHs. This arguably is less secure than gossip
because the vendor can lie, but as I noted previously, the vendor
also controls software updates and the trust anchor list,
so browser vendors are reasonably comfortable with designs that
require trusting them, at least for now. This is something Richard
Barnes and I looked at in concert with STH discipline, but ultimately
it wasn&#39;t worth it without some way to actually get the inclusion
proofs on the servers, which remained largely an unsolved problem.
As things stand today, clients don&#39;t really do anything to retrieve
or double-check STHs.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update: 2023-12-25&lt;/em&gt; Note that what I&#39;m referring to here is
that it&#39;s hard for clients to gossip. It&#39;s obviously not a problem
for services which are verifying each certificate that was issued
(monitors) to gossip, as discussed below.&lt;/p&gt;
&lt;h3 id=&quot;chrome-ct-auditing&quot;&gt;Chrome CT Auditing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#chrome-ct-auditing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Added 2023-12-25&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As Emily Stark pointed out to me on X/Twitter, Chrome actually
does some auditing, which I had somehow managed to miss. Specifically,
it checks to see if Google is aware of a given SCT. Joe
DeBlasio has a summary &lt;a href=&quot;https://groups.google.com/a/chromium.org/g/ct-policy/c/FddjjCNIrLo&quot;&gt;here&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;No Safe Browsing protections -&amp;gt; no SCT auditing&lt;/li&gt;
&lt;li&gt;Default Safe Browsing protections -&amp;gt; SCT auditing logic selects a
small proportion of TLS connections and performs a k-anonymous
lookup on an SCT. If that privacy-preserving SCT lookup reveals
that the SCT is not known to Google but should be, the client
uploads the certificate, SCTs, and hostname to Google (but no
other information).&lt;/li&gt;
&lt;li&gt;Enhanced Safe Browsing protections -&amp;gt; SCT auditing logic selects a
small proportion of TLS connections and uploads the certificate,
SCTs, and hostname to Google (but no other information).&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is an interesting design and gets around some of the problems
that I&#39;ve discussed above.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
The security properties it provides are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Google can learn which certificates have been issued by other
logs and do whatever checks it wants on whether they should
have been issued.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Google can check that other monitors are seeing the same thing
as it does (by gossiping between monitors, as in the previous
section), thus allowing them to independently check for
misissuance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Under certain assumptions about the attacker&#39;s capabilities, Google
will eventually learn about any certificate which wasn&#39;t
logged. What I mean by &amp;quot;certain assumptions&amp;quot; is that (1) the
attacker has to use the certificate reasonably often to have a high
probability of report and (2) a powerful attacker might be able to
impersonate the server to a client and then block the client&#39;s
subsequent network access to Google so that it can&#39;t make the
report.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This isn&#39;t nothing, but I think it also falls short of public
verifiability in several respects. First, it still leaves clients
vulnerable to accepting certificates which were never published;
it just makes it possible—modulo the caveats in point (3) above—to
detect the compromise after the fact. Second, it fundamentally
depends on Google acting as the guarantor that certificates
were published because they&#39;re the ones who run the auditing
service.&lt;/p&gt;
&lt;h2 id=&quot;overengineering&quot;&gt;Overengineering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#overengineering&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strike&gt;As a result of all this, CT has more or less given up on
public verifiability. As soon as you allow for SCTs, clients have no way of ensuring that
certificates have been logged before accepting them, and without
some mechanism for verifying retrospectively that certificates were
logged, there&#39;s not even any way for clients to detect that they
accepted an unlogged certificate, and CT just reduces to a system
where the clients trust the logs not to lie about whether they
are going to publish a given certificate.&lt;/strike&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Updated 2023-12-25, in light of conversation with Emily and Ryan&lt;/em&gt;
As a result of all this, CT provides fairly limited public
verifiability. At the time of acceptance, clients have no way
of ensuring that certificates have been logged before accepting them,
because the certificate might have just been issued and not yet
incorporated into a log. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#chrome-ct-auditing&quot;&gt;Chrome&#39;s CT auditing&lt;/a&gt;
provides a partial mechanism for retrospectively detecting that
unlogged certificate was accepted, but this really depends on trusting
Google, because Google has to see a copy of every certificate to
make this work.&lt;/p&gt;
&lt;p&gt;&lt;strike&gt;If we&#39;re just trusting the logs, though&lt;/strike&gt; Why then do we need all the machinery
of Merkle trees? The logs could just take in pre-certificates, issue SCTs,
and publish the certificates on their sites as soon as possible
(effectively immediately). This doesn&#39;t provide public verifiability,
of course; instead the logs act as what&#39;s called a &amp;quot;countersignature&amp;quot;,
in which the signature from the logs isn&#39;t attesting that they verified the certificate&#39;s
trustworthiness themselves, just that they&#39;ve seen it.
To a first order, the answer is that what we actually
have is a countersignature scheme and that the Merkle tree machinery
is unnecessary overhead, or, perhaps,
more charitably, futureproofing against some future world where we
solve the engineering problems described above.&lt;/p&gt;
&lt;p&gt;The problem is that it&#39;s expensive futureproofing, both in
terms of protocol complexity and in terms of operational brittleness.
A fairly large fraction of the &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6962&quot;&gt;CT RFC&lt;/a&gt;
is concerned with specifying the Merkle trees, the machinery of
Merkle tree proofs, and the like. All of this could just go away
if we were to just treat CT as a &amp;quot;countersign + publish&amp;quot; protocol,
leaving a dramatically simpler protocol that would be a thin
layer on top of HTTP.&lt;/p&gt;
&lt;p&gt;Worse yet, CT logs turn out to be hugely operationally complex to run
correctly. I haven&#39;t personally operated one, but the basic problem
seems to be tight timing requirements combined with the immutability
of the Merkle tree structure. Recall that an SCT is a promise to
include the certificate into the Merkle tree, which has to happen
within a finite period of time called the &lt;em&gt;maximum merge delay (MMD)&lt;/em&gt;
(which Chrome requires to be no more than 24 hours). The reason for
this is so that the clients can check that the log fulfilled its
promise in the SCT to actually put the certificate in the log. If the
log just had to eventually put it in, then whenever the client checked
it could just say &amp;quot;not right now&amp;quot;, hence the MMD.  But this means that
if you have any kind of glitch (say a precertificate gets lost in some
queue or you have some an outage of more than 24 hours), you&#39;re
suddenly out of compliance. Running a big production service with no
glitches is no easy task and it shouldn&#39;t be surprising that we&#39;ve
seen issues.&lt;/p&gt;
&lt;p&gt;Some examples:
In August, DigiCert&#39;s log was
&lt;a href=&quot;https://groups.google.com/a/chromium.org/g/ct-policy/c/R27Zy9U5NjM&quot;&gt;retired&lt;/a&gt;
because they had a bit flip in one of the entries in the tree
and just in November, Cloudflare&#39;s log had an &lt;a href=&quot;https://groups.google.com/a/chromium.org/g/ct-policy/c/eUGfneBSwls/m/IIu6xtMmBQAJ&quot;&gt;outage&lt;/a&gt;
in which they failed to include thousands of certificates within the
MMD. Even Google has had &lt;a href=&quot;https://groups.google.com/a/chromium.org/g/ct-policy/c/S-8lbl2nZeA&quot;&gt;outages&lt;/a&gt; and at least &lt;a href=&quot;https://groups.google.com/a/chromium.org/g/ct-policy/c/ZZf3iryLgCo/m/mi-4ViMiCAAJ&quot;&gt;one&lt;/a&gt; resulted in an MMD violation.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
The difficulty of running a log is a direct result of the
requirements introduced by the combination of SCTs and trying
to maintain the infrastructure that would support public verifiability,
even though public verifiability doesn&#39;t exist in practice. Running
them would be far simpler if those requirements were relaxed, and,
as far as I can tell, it would have no material impact on user
security.&lt;/p&gt;
&lt;p&gt;Why then, do we have this overengineered design? The history is a
little fuzzy, and I wasn&#39;t there at the beginning, but my sense is
that when CT was originally designed the intention was &lt;em&gt;not&lt;/em&gt; to
have SCTs and instead to have just Merkle trees and inclusion proofs
delivered with certificates (more or less the design I described in
&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-ideal&quot;&gt;Part I&lt;/a&gt;). Despite some challenges, this
design probably could have been made to work in a greenfield setting,
albeit at the cost of
high issuance latency, but eventually the designers
were forced to add SCTs for deployability reasons. By the time
it was clear we would be stuck with SCTs indefinitely,
there was a huge amount of inertia behind the Merkle tree
design, which was widely deployed and people were reluctant to climb down from it and from
the hope of future public verifiability. So, instead we have a
system with the complexity of public verifiability with the security
of countersignatures.&lt;/p&gt;
&lt;p&gt;Despite all this, the CT RFC (both the original 2013 version and the
2021 update) still claims that logs don&#39;t need to be trusted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Certificate transparency aims to mitigate the problem of misissued
certificates by providing publicly auditable, append-only, untrusted
logs of all issued certificates.  The logs are publicly auditable so
that it is possible for anyone to verify the correctness of each log
and to monitor when new certificates are added to it.  The logs do
not themselves prevent misissue, but they ensure that interested
parties (particularly those named in certificates) can detect such
misissuance.  Note that this is a general mechanism, but in this
document, we only describe its use for public TLS server certificates
issued by public certificate authorities (CAs).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I suppose at the time it
was written (2013) this could be read as aspirational language in the hope
that some way could be found to deal with the issues described above.
From the perspective of 2023, however, it looks more like wishful
thinking.&lt;/p&gt;
&lt;h2 id=&quot;ct%3A-still-useful&quot;&gt;CT: Still Useful &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#ct%3A-still-useful&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Despite everything I&#39;ve said above about the limitations of CT verifiability,
it&#39;s still proven to be exceedingly useful. There is a robust set of
logs and quite a few &lt;a href=&quot;https://certificate.transparency.dev/monitors/&quot;&gt;services&lt;/a&gt;,
&lt;em&gt;and CT has &lt;a href=&quot;https://certificate.transparency.dev/community/#successes-grid&quot;&gt;helped detect&lt;/a&gt;
a number of serious incidents, in several cases leading to CAs being
distrusted. [Updated: 2023-12-25]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;First, a lot of CA issues are simple mistakes rather than intentional
misbehavior that the CA is trying to conceal. Forcing CAs to publish
all of their certificates makes this kind of error easier for
third parties to detect, which happens with some frequency. This
benefit doesn&#39;t require browsers to check SCTs at all, just that
CAs be required to log certificates.
In addition, the requirement to log certificates means that it&#39;s possible
to construct a database of all the valid certificates, which is a very
useful research tool.&lt;/p&gt;
&lt;p&gt;Second, CT requirements make it harder to cheat because not only does
the CA have to intentionally misbehave, it has to collude with logs
to do so. Obviously, finding one or more malicious logs is harder than
just having the CA be malicious, especially given the relatively small
number of logs, so CT provides a real security benefit
even with no public verifiability.&lt;/p&gt;
&lt;p&gt;Finally, CT is a really useful tool for gaining visibility into the
overall state of the WebPKI ecosystem; because every certificate
has to be published, CT makes it much easier to understand the
system as a whole.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;What we have here is yet another case of how the Internet is build on
&amp;quot;good enough&amp;quot;.&lt;/p&gt;
&lt;p&gt;It&#39;s a commonplace that the WebPKI is a cobbled together mess and at
the time that CT was designed, it was even moreso. At roughly the same
time CT was published there was a fair amount of interest in replacing
the WebPKI with something based on
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/&quot;&gt;DNSSEC/DANE&lt;/a&gt; which looked like it might
have a better attack profile, in particular because there weren&#39;t
a large number of actors able to attest to a given name.
In practice, though, DANE deployment for the Web totally stalled,
largely because it was basically a forklift upgrade.&lt;/p&gt;
&lt;p&gt;By contract, CT is yet another patch on top of
the WebPKI, but was incrementally deployable.
Imperfect though it is, it has gone a long way towards
improving the system, both by making undetected misissuance harder and
by making simple misbehavior easier to spot and address.
I know there are still people who want to replace the WebPKI with
something based on totally different principles, but in 2023, that looks
fairly implausible.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Similarly, while CT is overcomplicated, hard to operate,
and a lot more than we really needed, it&#39;s also what&#39;s deployed
and people aren&#39;t really excited about changing it. In fact, while
there was an extensive effort to produce a revision of CT
(&amp;quot;Certificate Transparency v2&amp;quot;), eventually everyone just kind
of ran out of energy and while it did get published as an
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9162&quot;&gt;RFC&lt;/a&gt;, as far as
I know nobody implements it.
If we were starting
from scratch, we&#39;d probably do it differently (see &amp;quot;good enough&amp;quot;, supra),
but that&#39;s not where we are, and it&#39;s easier to just stick
with what we have.&lt;/p&gt;
&lt;p&gt;None of this is to say that transparency and public verifiability aren&#39;t
good ideas, and now that end-to-end encrypted messaging has become
so popular there is increased interest in transparency for those
systems. The requirements here are somewhat different and the result
is a rather fancier system called &amp;quot;key transparency&amp;quot;, which
will be the subject of the next post in this series.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is also the reason why clients requiring that servers
provide inclusion proofs for sufficiently old certificates doesn&#39;t
help. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The reasoning here is that your DNS server already knows what sites
you are visiting and so if you could also retrieve the STH over
DNS, this would provide privacy. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This state of knowledge &lt;a href=&quot;https://petsymposium.org/popets/2022/popets-2022-0075.pdf&quot;&gt;paper&lt;/a&gt;
by Meiklejohn, DeBlasio, O&#39;Brien, Thompson, Yeo, and Stark provides
a good survey of the alternatives and the present situation. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Apple&#39;s recent deployment of Key Transparency for iMessage does
&lt;a href=&quot;https://security.apple.com/blog/imessage-contact-key-verification/&quot;&gt;gossip&lt;/a&gt;
but this is much more natural because iMessage clients already talk to each other. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, this has some undesirable privacy properties,
similar to those of &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/&quot;&gt;Safe Browsing&lt;/a&gt;,
and worse if the client actually reports a suspicious certificate. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See Andrew Ayer&#39;s excellent &lt;a href=&quot;https://www.agwa.name/blog/post/how_ct_logs_fail&quot;&gt;writeup&lt;/a&gt;
of CT log failures, though Ayers is a bit more sanguine about failures
than I am. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Benjamin, O&#39;Brien, and Westerban have a &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-davidben-tls-merkle-tree-certs-01&quot;&gt;proposal&lt;/a&gt;
to replace the combination of X.509 and CT with something called &amp;quot;Merkle Tree Certificates&amp;quot;,
but conceptually this is the same trust architecture as the WebPKI. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-2/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>A hard look at Certificate Transparency, Part I: Transparency Systems</title>
		<link href="https://educatedguesswork.org/posts/transparency-part-1/"/>
		<updated>2023-12-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/transparency-part-1/</id>
		<content type="html">&lt;p&gt;Identifying the communicating endpoints is a key requirement for
nearly every security protocol. You can have the best crypto in the
world, but if you aren&#39;t able to authenticate your peer, then you are
vulnerable to impersonation attacks.  If the peers have communicated
before, it is sometimes possible to authenticate directly, but this
doesn&#39;t work in many common situations, such as when you are given the
address of a Web site and need to connect to it securely.&lt;/p&gt;
&lt;p&gt;Nearly every major communications security protocol has the
same basic authentication design:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Endpoints have human-readable identities (e.g., domain names,
e-mail addresses, phone numbers, etc.)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;trusted&lt;/strong&gt; authentication service attests to the &lt;em&gt;binding&lt;/em&gt; between an
identity and the endpoint&#39;s public key.&lt;/li&gt;
&lt;li&gt;The endpoint uses its private key to prove it identity.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For example, in the HTTPS/Web context, sites are authenticated by having
&lt;em&gt;certificates&lt;/em&gt; which are issued by a &lt;em&gt;certificate authority (CA).&lt;/em&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
These CAs are in turn vetted by browser vendors, who decide which
CAs their browsers will trust. This entire system is called
the &amp;quot;WebPKI&amp;quot; (see &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#background%3A-https-and-the-webpki&quot;&gt;here&lt;/a&gt;
for more background on this.)&lt;/p&gt;
&lt;p&gt;The key word in this system is &lt;strong&gt;trust&lt;/strong&gt;: the endpoints need to
trust that the authentication service doesn&#39;t falsely attest to a
binding for the wrong person (technical term: &amp;quot;misissuing&amp;quot;). If an
authentication service makes a
mistake or deliberately cheats, then this could allow the attacker to
impersonate a valid user of the system, which is obviously bad.
This is not merely a hypothetical issue. In the WebPKI alone, there
have been a
&lt;a href=&quot;https://sslmate.com/resources/certificate_authority_failures&quot;&gt;series&lt;/a&gt;
of high profile certificate authority failures, perhaps most famously
in 2011 when the Dutch CA
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=DigiNotar&amp;amp;oldid=1162693847&quot;&gt;DigiNotar&lt;/a&gt;
was subverted and issued a series of bogus certificates, including one
for Google. The bottom line is that an authentication service of this
type represents a single point of failure for the system as a whole.
The WebPKI is especially bad here because there are a large number
of CAs, nearly all of which can attest to any domain name, so there
are multiple entities, each of which is a single point of failure.&lt;/p&gt;
&lt;p&gt;There are a number of potential approaches for defending against
this problem but the one that the community seems to
have settled on is what&#39;s called a &lt;em&gt;transparency&lt;/em&gt; system.
The basic concept of such a system is that you retain the
idea of a trusted authentication service but add on a layer
in which it publishes the bindings it is attesting to so
that anyone can check that it&#39;s not misissuing.
The first transparency system, and still the most widely deployed, is
&lt;a href=&quot;https://certificate.transparency.dev/&quot;&gt;Certificate Transparency (CT)&lt;/a&gt;,
designed by Ben Laurie, Adam Langley, and Emilia Kasper (all at Google
at the time) in the wake of the DigiNotar incident. CT was designed
to bring transparency to the famously mismanaged WebPKI.
More recently, there has also been a lot of interest in CT-like
(but fancier) systems for non-WebPKI applications, such
as &amp;quot;key transparency&amp;quot; for messaging systems, but in this post
I want to focus on CT.&lt;/p&gt;
&lt;p&gt;As you can see from the diagram below, CT is a very complicated system,
in part because it had to be
retrofitted onto the existing WebPKI design and in part due to some
technical decisions which in retrospect look like they were
mistakes (I&#39;ll get into those in the next post in the series).&lt;/p&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/with-ct-mix.png&quot; /&gt;
&lt;p&gt;[Overview of Certificate Transparency from &lt;a href=&quot;https://certificate.transparency.dev/howctworks/&quot;&gt;transparency.dev&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;What I want to do in the rest of this post is to try
to gradually build up to a sort of idealized version of CT from
first principles. In a future post, I&#39;ll look at actually
existing CT, some of the compromises that it made in the
name of deployment, and the implications of those compromises.&lt;/p&gt;
&lt;h2 id=&quot;transparency-systems&quot;&gt;Transparency Systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#transparency-systems&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic idea behind a transparency system is not to &lt;em&gt;prevent&lt;/em&gt;
misissuance but to detect it. At a high level, this works as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The CA publishes every certificate that it issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The owner of a given identity—and potentially other
people—ensures that it recognizes every certificate that was
published.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Relying parties check that a certificate is in the log before
accepting it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The figure below provides an overview of the verification pieces
of this process in the Web context:&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/TransparencyOverview.png&quot; alt=&quot;Transparency Overview&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Conceptual overview of a transparency system
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;At some point, &lt;code&gt;example.com&lt;/code&gt; gets a certificate (&lt;code&gt;1234&lt;/code&gt;) from the
CA, which publishes that certificate. Then, when Alice wants to connect to &lt;code&gt;example.com&lt;/code&gt;,
it presents that certificate (step 1). Alice then checks with the
published certificate list to verify that the certificate is
actually on the list (step 2).  Separately, &lt;code&gt;example.com&lt;/code&gt; periodically
checks the list to be sure that only certificates it knows
about are on the list.&lt;/p&gt;
&lt;p&gt;There are a lot of moving pieces, so it&#39;s worthwhile working through
the logic here for why this works.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;is-it-possible-to-prevent-misissuance%3F&quot;&gt;Is it possible to prevent misissuance? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#is-it-possible-to-prevent-misissuance%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;While detecting misissuance is good, it would be better to prevent
it entirely. Unfortunately, this turns out to be a very challenging
problem because the authentication service has to determine who
owns a given name (e.g., &lt;code&gt;example.com&lt;/code&gt;), and that determination
isn&#39;t directly verifiable by third parties. There are designs
which bind name issuance to authentication (often using some
kind of blockchain), but the problem with these systems is
that they don&#39;t allow for any discretion on the part of the
authentication service, so, for instance, if I register
&lt;code&gt;example.com&lt;/code&gt; and then lose my keys I still want to be able
to reclaim it. This may require some kind of
manual intervention. More on this &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/&quot;&gt;here&lt;/a&gt;.
If you&#39;re going to allow for discretion to handle this kind of
case, then you need to worry about that discretion being
abused.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;misissuance-detection&quot;&gt;Misissuance Detection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#misissuance-detection&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because every issued certificate is published, if
the CA misissues a certificate, then it will
also be published and can then be detected, either by the
true owner of the identity or by a third party who notices
something fishy (why is some CA I&#39;ve never heard of issuing
a certificate for Google?).&lt;/p&gt;
&lt;p&gt;In the Web context, this is all somewhat harder than it sounds: if
you&#39;re a big and well-operated site, then you may well know every
certificate that you have requested, but that&#39;s not necessarily true
for smaller sites.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Similarly, third party verifiers won&#39;t necessarily be able
to check that the issued certificates are what is expected.
The result is that while you should expect that misissuance
of high profile sites will likely be detected, misissuance of
smaller sites could easily go unnoticed.&lt;/p&gt;
&lt;h3 id=&quot;managing-misissuance&quot;&gt;Managing Misissuance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#managing-misissuance&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;OK, so you&#39;ve detected a certificate that was misissued, now what?
The general story is that you report it. What happens then depends
on how the certificate was misissued.
In the simple case of unintentional misissuance—which
definitely happens—you would expect the CA to revoke
the certificate, investigate what happened, and if possible address whatever issue lead to
the misissuance.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;However, it&#39;s also possible that the CA is not well
operated or the misissuance is more than a simple mistake.
In this case, browsers might decide to distrust the CA,
with the effect that &lt;em&gt;all&lt;/em&gt; certificates issued by the
CA. This is a disruptive step, but it does happen, even
to large CAs. For instance, in response to a series of
&lt;a href=&quot;https://wiki.mozilla.org/CA/Symantec_Issues&quot;&gt;operational issues&lt;/a&gt; the browsers
distrusted Symantec (very gradually) between 2016 and 2018.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Much of the value of a transparency system like this is that
works together with the threat of distrust as an incentive to
good behavior. As noted above, it&#39;s possible for misissuance
for the names of small sites to go undetected, but once there
is some evidence of some misbehavior—perhaps of a single
site—the transparency system
allows for easier investigation of the other certificates issued
by the CA. It is also possible to use the transparency system
to detect other kinds of CA misbehavior than misissuance
which can then prompt further investigation.&lt;/p&gt;
&lt;h3 id=&quot;incompetence-versus-malice&quot;&gt;Incompetence versus Malice &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#incompetence-versus-malice&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If all we are worried about is mistakes by the authentication
service, then just publishing all the certificates is mostly enough;
even if the CA inadvertently issues a certificate to the wrong
person, it will still be published and so the mistake can potentially be
detected. But what if the CA is intentionally misissuing? In this
case, it can just provide the certificate to the attacker without
publishing it, in which case the fraud isn&#39;t readily detectable.&lt;/p&gt;
&lt;p&gt;This is the reason for requiring the relying parties (clients) to
enforce that the certificate has been published (point 3 above). This
prevents attacks where the AS doesn&#39;t publish the certificate because
the relying parties just won&#39;t accept it, making the attack pointless.
If relying parties don&#39;t check for the presence of the certificate
on the published list then nothing requires the CA to publish every certificate.&lt;/p&gt;
&lt;h3 id=&quot;partitioned-views&quot;&gt;Partitioned Views &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#partitioned-views&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The description above just covers the logic of a transparency
system but doesn&#39;t tell you how one actually works and in fact I&#39;ve
glossed over an important technical problem, which is how to
ensure that the published list of certificates is the same for
everyone. The obvious thing to to do is for the AS to just
publish the list of certificates it has issued on its
Web site, but this isn&#39;t secure. Consider what happens if the AS gives different
answers to different people, like so:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/TransparencyOverviewPartition.png&quot; alt=&quot;Partitioning in a transparency system&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Partitioning attacks
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In this scenario the attacker has obtained a misissued certificate
from the CA (not shown), which creates two lists of certificates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;List 1, which has the attacker&#39;s certificate&lt;/li&gt;
&lt;li&gt;List 2, which has the legitimate certificate&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When &lt;code&gt;example.com&lt;/code&gt; goes to check the list of certificates,
the CA provides List 2, containing the correct certificate (&lt;code&gt;1234&lt;/code&gt;) so everything looks OK.
On the other hand, when Alice connects to the attacker (impersonating &lt;code&gt;example.com&lt;/code&gt;),
it presents the fake certificate (&lt;code&gt;ABCD&lt;/code&gt;). Alice then
connects to the CA, which provides List 1, containing &lt;code&gt;ABCD&lt;/code&gt;
everything looks OK here too, and the attack
goes undetected.&lt;/p&gt;
&lt;p&gt;The point here is that the authentication server needs to publish
the certificate list in some way that everyone has the same view
and &lt;em&gt;that they can verify that they have the same view&lt;/em&gt; (technical
term: &lt;em&gt;consensus&lt;/em&gt;). As long as this is true, then we know
that the owner of the identity has had a chance to check any
certificate which the relying party might treat as valid.&lt;/p&gt;
&lt;p&gt;The analogy I like to use for this kind of consensus
(I&#39;m not sure who originated it) is that
the authentication server publishes each binding by using a giant
laser to inscribe each binding onto the face of the moon. This
allows anyone with a telescope to look up—at least during the
night—and see what bindings have been created.&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/moonlaser.jpg&quot; alt=&quot;A laser writing on the face of the moon&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Writing on the face of the moon. Image by Kate Hudson with components from Midjourney and Adobe AI.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is what is known as a &amp;quot;publicly verifiable&amp;quot; system
in that it doesn&#39;t require trust. Anyone can see for themselves
what is written on the face of the moon, so you aren&#39;t
depending on the CA not to cheat.&lt;/p&gt;
&lt;p&gt;Unfortunately, the giant laser is physically impractical,
and so we need some other technology for providing consensus.
Much of the complexity in transparency systems derives from
this requirement.&lt;/p&gt;
&lt;h2 id=&quot;manufacturing-consensus&quot;&gt;Manufacturing Consensus &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#manufacturing-consensus&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted above, the basic challenge we have here is ensuring
that every client has the same view of the certificate database.&lt;/p&gt;
&lt;p&gt;The obvious thing to do is for people—really client
software—to share copies of the database with each other so that
you effectively flood fill the database to everyone and eventually
everyone has a copy of the whole database. Alternately, if you
have a piece of software like a browser which has an update
channel, the vendor can send a copy of the database
to all its users.  Of course in this case you&#39;re trusting the browser
vendor not to send a fake database, but as a practical
matter you&#39;re also trusting them not to send you malicious
updates anyway, so it&#39;s not clear how much worse this makes
the situation. More on this in a future post.
Whichever design you are using, if the attacker has mounted
a partitioning attack as described above, then the site
will eventually get a copy of the correct database from some other
element, thus allowing for detection of misissuance when it
sees a certificate it doesn&#39;t recognize.&lt;/p&gt;
&lt;p&gt;One thing that&#39;s very important to realize is that it doesn&#39;t
matter if some—or even most—of the endpoints in the system
are malicious; if the flood fill system is working, then eventually&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
each endpoint will talk to someone who isn&#39;t malicious,
so they will eventually get a copy of every certificate. And
because certificates are publicly verifiable (you just check
the signature), it&#39;s easy to store every certificate that
is valid and discard the ones that aren&#39;t. A malicious node can
remove certificates from the database they send you, but they
can&#39;t insert certificates that don&#39;t exist or prevent other
endpoints from sending you valid certificates.&lt;/p&gt;
&lt;p&gt;Moreover, it&#39;s not really required that everyone get a full
copy of the database: consider the case where we have a fake
certificate for &lt;code&gt;example.com&lt;/code&gt;. If the operators of &lt;code&gt;example.com&lt;/code&gt;
see it, then they can publish it and report it to the browser
vendors, who can then investigate, as described &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#handling-misissuance&quot;&gt;above&lt;/a&gt;. The point here is that the system
doesn&#39;t need to work perfectly in order to detect attacks;
it just needs to work well enough that (1) any relying party
will be able to validate that a certificate has been published
in the database and (2) the attacker cannot reliably prevent parties
trying to verify database correctness from getting a copy of
misissued certificates.&lt;/p&gt;
&lt;p&gt;With the right data structure, it&#39;s also possible to make
partition attacks easier to detect. For instance, if each
CA publishes one database a day and signs the entire database,
then any element which receives two databases for a single
day can immediately detect that there has been cheating.&lt;/p&gt;
&lt;p&gt;The problem, obviously, is that this kind of flood fill is incredibly
inefficient: Let&#39;s Encrypt alone has about &lt;a href=&quot;https://letsencrypt.org/stats/&quot;&gt;300 million valid
certificates&lt;/a&gt;; at 1K each, this would
be a database of 300GB, not something you want to be storing on your
phone, let alone having to send to everyone else you come into contact
with—ignoring for the moment the question of how you&#39;re going to
transmit the database around. Clearly, this simple system is not
practical.&lt;/p&gt;
&lt;p&gt;Of course, you don&#39;t actually need to send a copy of the database
to everyone, you just need to verify that you have the same database
as everyone else, which you can do by exchanging hashes of the
database, but this doesn&#39;t get us very far because (1) you still need
to keep a copy of the database on your computer and (2) the database
isn&#39;t static, but instead new certificates are constantly being
issued (Let&#39;s Encrypt issues over 3 million certificates &lt;em&gt;a day&lt;/em&gt;).
Addressing this requires some new technology, specifically
something called a &amp;quot;Merkle Tree&amp;quot;.&lt;/p&gt;
&lt;h3 id=&quot;background%3A-merkle-trees&quot;&gt;Background: Merkle Trees &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#background%3A-merkle-trees&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The idea behind a Merkle Tree is to allow a way to efficiently commit
to a set of values without actually publishing any of the
values.&lt;/p&gt;
&lt;p&gt;As an intuition pump, suppose I run a streaming service which send
movies over the Internet and I want people to be confident that they
are getting the right movie and not some content generated by an
attacker. In the real world, we just carry all the data over a TLS
connection, but let&#39;s assume I&#39;m too cheap for that. Instead, what I
could do is send the &lt;em&gt;hash&lt;/em&gt; of the content over the TLS connection and
then let the client retrieve the rest over HTTP (there used to be a time
when people really worried about the cost of encryption). The problem with
this is that the hash is computed over the entire movie, but we
obviously want people to able to verify that there hasn&#39;t been any
tampering as they are watching it. The obvious solution here is to
break the movie up into chunks—you want to do this anyway so
that people can easily scroll forward or backward—and
then send a hash for each chunk over the TLS connection. Then, when
the client retrieves each chunk, they can verify the hash before
they play it.&lt;/p&gt;
&lt;p&gt;This still involves sending a fair amount of data over the TLS
connection, though: suppose each chunk is 5s long, then a 2 hr movie
will be 1440 chunks and require sending something like 46KB over the
TLS connection. It turns out that there is a more efficient strategy,
using one of the computer scientist&#39;s favorite tools, the binary tree.
The basic idea is that we hash each chunk and then arrange the chunks
in a binary tree, like so:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/merkle-tree.png&quot; alt=&quot;Merkle Tree&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
A Merkle tree
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The leaves of the tree are the hashes of the individual chunks and
then each interior node is the hash of its two children&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
This way,
the root of the tree includes the hashes of all of the leaves,
so if any leaf changes then it would also change the hash of the root.
This way, you can publish only the root hash over the TLS
connection and anyone can verify the leaves by just hashing them
up to the root.&lt;/p&gt;
&lt;p&gt;Well, sort of. What I just described requires having all the chunks,
but remember we want to be able to verify a chunk without other
chunks. Fortunately, there is an easy way to arrange this: when
you send a chunk, you also send enough nodes in the tree to let
the receiver reconstruct the tree. Specifically, you send the
nodes &lt;em&gt;next to&lt;/em&gt; the nodes on path between your chunk and the root.
For example, suppose I just sent chunk 1. The receiver can compute
&lt;code&gt;H(C1)&lt;/code&gt; for themselves, but they can&#39;t compute the parent node without
knowing &lt;code&gt;H(C2)&lt;/code&gt;, so I have to send that. Similarly, they can&#39;t compute
the root without knowing &lt;code&gt;H ( H(C3) + H(C4) )&lt;/code&gt; so I have to send
that as well. I don&#39;t have to send &lt;code&gt;H(C3)&lt;/code&gt; or &lt;code&gt;H(C4)&lt;/code&gt; because they
don&#39;t need that to compute the root.&lt;/p&gt;
&lt;p&gt;The figure below illustrates what I&#39;m talking about:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/merkle-tree-copath.png&quot; alt=&quot;The co-path of the Merkle tree&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
The co-path of a Merkle tree
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The sender has to transmit everything in blue, specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The chunk &lt;code&gt;C1&lt;/code&gt; itself so that the receiver can compute &lt;code&gt;H(C1)&lt;/code&gt;,
though of course it was transmitting this anyway.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;H(C2)&lt;/code&gt; so that the the receiver can compute the parent node
&lt;code&gt;H( H(C1) + H(C2) )&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;H( H(C3) + H(C4) )&lt;/code&gt; so that the receiver can compute the root&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The receiver computes everything in black for themselves and then
compares it with the root hash it received over the secure channel.
If everything checks out, then this proves that the tree was computed
over &lt;code&gt;C1&lt;/code&gt; (and that it was in that position in the tree) and therefore
that it&#39;s a legitimate chunk. The technical term here is
an &amp;quot;inclusion&amp;quot; proof, because it proves that the chunk was included
in the computation for the tree.&lt;/p&gt;
&lt;p&gt;The key thing to realize is that the number of extra hashes that
the sender has to include in order to let the receiver verify a chunk
is less than the number of total chunks. Specifically, it&#39;s the depth
of the tree, which is to say the logarithm base 2 of the number of
chunks. In this case, that&#39;s 2 hashes, which is only half the
number of chunks, but if there were thousands of chunks then this
would be a huge difference.&lt;/p&gt;
&lt;h3 id=&quot;a-transparency-system-with-merkle-trees&quot;&gt;A Transparency System with Merkle Trees &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#a-transparency-system-with-merkle-trees&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It should now be apparent what we are going to do next, which is to
put the certificates into a Merkle Tree. As a starting point,
let&#39;s say that each CA takes all the certificates and makes
them the leaves of the tree.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
With Let&#39;s Encrypt&#39;s 3 million
certificates a day, this tree will be of around depth 22 for
a day&#39;s certificates.
The figure below provides an overview of how this fits together:&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ct-abstract.png&quot; alt=&quot;Certificate issuance with transparency&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Certificate issuance with Merkle trees
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;When &lt;code&gt;example.com&lt;/code&gt; wants to get a certificate, it contacts the CA
as usual. The CA does whatever procedure it wants to validate
the request and then waits for other certificate
requests to come in. After some period (in this case daily), the
CA generates all the certificates and then builds a Merkle tree
out of them. It publishes the whole Merkle tree on the Internet
and then sends each site it&#39;s certificate, as well as the
inclusion proof that the certificate was included in today&#39;s
tree. The inclusion proof is comparatively small; using
Let&#39;s Encrypt as our reference point, it will be about 600-700
bytes.&lt;/p&gt;
&lt;p&gt;When the client subsequently contacts the site, the site provides
both its certificate and the inclusion proof.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
The certificate can be verified in the usual fashion, but the
client &lt;em&gt;also&lt;/em&gt; needs to verify the inclusion proof in order to
ensure that the certificate was actually published. In order
to do this, it needs &lt;em&gt;both&lt;/em&gt; the inclusion proof itself and
the root of the Merkle tree that was published at the time
of certificate issuance. Instead of flood filling the tree
itself, we instead arrange to flood fill the signed root of the tree
(or, more likely for a browser, to distribute it in the update channel).
The client verifies the signature on the root to
ensure that it&#39;s valid and then checks the inclusion proof
in order to be sure that the certificate was really included
in the tree.&lt;/p&gt;
&lt;p&gt;This is a big improvement in the amount of information the client
needs to store and retrieve. The signed root itself is very small
(~100 bytes) and then on each connection it needs to retrieve
~600-700 bytes of inclusion proof for each certificate, which is
around the size of your typical certificate, so this perhaps doubles
the overhead of the TLS connection, which isn&#39;t that bad.&lt;/p&gt;
&lt;p&gt;Note that in order to verify that there are no unexpected certificates
for its domain names, the &lt;em&gt;site&lt;/em&gt; still needs to download the entire
certificate database, or more likely use some service which does it
for it. However, sites typically have significant resources, and the
database isn&#39;t &lt;em&gt;that&lt;/em&gt; big, so this is a much smaller burden than
requiring every browser to retrieve a copy. Moreover, a service which
does this kind of checking just needs to download the database once
for all of its clients, which lets it amortize the cost.&lt;/p&gt;
&lt;h2 id=&quot;security-properties&quot;&gt;Security Properties &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#security-properties&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This system does a reasonable job of providing the security guarantees
we asked for at the start.&lt;/p&gt;
&lt;p&gt;Because the client verifies the inclusion proof for a certificate,
it is able to ensure that it chains up to a signed root.
While the CA can technically make more than one tree with different contents,
that requires signing two tree roots, which then have to
published somehow in order to be useful.
As there is supposed to be only one root per day, as soon as
any endpoint sees two different roots for the same period, it knows
that the CA is cheating and can prove it to any third party
just by publishing both signed roots.&lt;/p&gt;
&lt;p&gt;If we&#39;re doing simply peer-to-peer flood fill, not every client
will be able to see both roots, but it&#39;s likely that one will.
If clients are getting their copy of the signed root from their
vendor, then the situation is even simpler: every client from
the vendor will have the same root and as long as vendors
check that their roots match and sites/services that want
to check the database verify that their roots match the vendors
roots, there&#39;s no real way to publish two roots without being
immediately detected.&lt;/p&gt;
&lt;p&gt;The result is a system that is publicly verifiable in that everyone has
the same view of the certificates that have been published.
This isn&#39;t perfect in that you still have to actually detect misissuance,
which isn&#39;t always straightforward for the reasons I discussed
above, but at least it&#39;s not possible to have covert misissuance.
This means that misissuance for big sites will probably be detected,
and if any kind of misissuance is detected it&#39;s much easier to
investigate because you have a permanent record.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-real-world-certificate-transparency&quot;&gt;Next Up: Real World Certificate Transparency &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#next-up%3A-real-world-certificate-transparency&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the beginning, I said that I was going to try to build an idealized
version of Certificate Transparency, and that&#39;s what we have here. There are still a fair
number of moving pieces, but the result has strong and fairly straightforward
security properties. Unfortunately, CT as actually deployed involved
quite a few technical compromises and the result was something more
complicated and with quite different security properties. I&#39;ll be
talking about those compromises and their consequences in the next post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, yes, I know it&#39;s technically a &amp;quot;certification authority&amp;quot;,
but at this point, can we just agree that it&#39;s &amp;quot;certificate authority&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s also not always possible to outsource this job to
a CDN or hosting provider, because you might have your
site hosted across more than one service, so no single
service can check that it recognizes every certificate.
For instance, suppose your site is both on Cloudflare and
Fastly; both services will have certificates for your
domain and if Cloudflare goes to check for certificates
that weren&#39;t issued to it, it will find the ones issued
to Fastly. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The processes used to validate domains for certificate
issuance are &lt;a href=&quot;https://www.princeton.edu/~pmittal/publications/bgp-tls-usenix18.pdf&quot;&gt;far from perfect&lt;/a&gt;,
so even a well-operated CA can still misissue. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that because certificates are signed by the CA, anyone can
verify that they really issued it, without the cooperation of the
CA after the fact. The certificate itself plus the claim by
the domain&#39;s operator is prima facie evidence that something
is wrong. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
&amp;quot;Eventually&amp;quot; is doing a lot of work here, but this isn&#39;t
the system we&#39;re going to build, so I&#39;m just going to
handwave past it. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: you actually don&#39;t want to use exactly
this structure because because it creates ambiguity between
an interior node with children H(A) and H(B) and a leaf
node with value H(A) + H(B), but that&#39;s easy to fix. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Actual CT uses one big tree that grows over time, but
this is conceptually easier to describe. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For deployment reasons, we&#39;d actually like the inclusion
proof to be included in the certificate, so we don&#39;t
need to modify the TLS stack. This is technically possible
but doesn&#39;t matter at the moment. &lt;a href=&quot;https://educatedguesswork.org/posts/transparency-part-1/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Adventure Run Report: Northern Yosemite 50</title>
		<link href="https://educatedguesswork.org/posts/northern-yosemite/"/>
		<updated>2023-10-29T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/northern-yosemite/</id>
		<content type="html">&lt;p&gt;After a kind of disappointing—but still the right call—decision to
DNF at Teanaway 100, I found myself with a big pile of fitness, nothing
planned for the rest of the year, but not really ready to just call it a season
and start thinking about 2024. There weren&#39;t any races left I wanted to do,
so instead I decided to try one of the adventure run loops that I had been
eyeing for the summer but had to put off because the record
snows in the 2022/2023 season kept the Sierras impassable late into the season,
specifically another  &lt;a href=&quot;https://pantilat.wordpress.com/&quot;&gt;Leor Pantilat&lt;/a&gt; route that he
called the &lt;a href=&quot;https://pantilat.wordpress.com/2011/08/22/northern-yosemite-50/&quot;&gt;Northern Yosemite 50&lt;/a&gt;.&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-prep-small.jpg&quot; alt=&quot;Photo of my stuff&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
My stuff ready to go.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Fortunately, my former training partner &lt;a href=&quot;https://heapingbits.net/&quot;&gt;Chris Wood&lt;/a&gt;
(now sadly reduced to running loops around Central Park in NYC) was in town, so
we were able to do it together. Chris was actually already in Yosemite
Valley, so we drove out separately and stayed at the
&lt;a href=&quot;https://willowspringsresort.com/&quot;&gt;Willow Springs Resort&lt;/a&gt; (a bit rustic
but friendly and reasonably nice), which is about 30 minutes away from the
start of the route at &lt;a href=&quot;https://monovillage.com/&quot;&gt;Annett&#39;s Mono Village&lt;/a&gt;.
The &amp;quot;village&amp;quot; itself is basically a paid RV campground, but there is
a big parking lot on the lake which seemed to be free. There was a
&amp;quot;No overnight parking&amp;quot; sign, but we didn&#39;t expect to be there too
far into the next morning in the worst case, so just left a note saying we
were out trail running and hoped it would be OK.&lt;/p&gt;
&lt;p&gt;The route is a big lollipop of just around 50 miles and 10000 ft of climbing, starting at
Twin Lakes in the Hoover Wilderness at around 7000 ft, then
quickly climbing above 9000 and mostly staying above there until
the last 5 miles or so, with a high point of 10300 ft. Pantilat did it
15:28 and so I figured we were looking at at least 16 hrs and probably
more this, which is a pretty long day this late in the season
(sunset is around 6:00), so we decided on a 5 AM start, which
mean that both the beginning and end would be on headlamp.&lt;/p&gt;
&lt;h2 id=&quot;start-to-the-loop-%5B7.15-mi%2C-%2B2246%2F-157-ft%2C-2%3A22%3A16%2C-19%3A56%2Fmi%2C-15%3A41%2Fmi-gap%5D&quot;&gt;Start to the Loop [7.15 mi, +2246/-157 ft, 2:22:16, 19:56/mi, 15:41/mi GAP] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/northern-yosemite/#start-to-the-loop-%5B7.15-mi%2C-%2B2246%2F-157-ft%2C-2%3A22%3A16%2C-19%3A56%2Fmi%2C-15%3A41%2Fmi-gap%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first 7 miles or so are the &amp;quot;stem&amp;quot; of the lollipop, a steady climb
of about 2500 feet. It was really cold at the start (~45 F) so I was in
gloves and a jacket and Chris was in a warm shirt and gloves,
both of which were pretty much the uniform for the rest of the
day as it never really warmed up.
We got a little scare right as we were passing through the
campground at the start when we saw a black bear rummaging
through the garbage cans. The bear seemed more interested in trying
to find food than in bothering us but we gave it a wide berth
anyway.&lt;/p&gt;
&lt;p&gt;As usual for the start of a run, we were fresh, and the trail itself
is in pretty good shape for the Sierras and was still easy to find in
the dark, so went pretty smoothly.  In any case, we got to the
junction quickly and were definitely thinking that this was going to
be a fast day.  As it turned out, we were shortly to be punched in the
face by reality.&lt;/p&gt;
&lt;h2 id=&quot;to-the-pct-%5B8.47-mi%2C-%2B846%2F-997-ft%2C-2%3A50%3A17%2C-20%3A06%2Fmi%2C-18%3A27-gap%5D&quot;&gt;To the PCT [8.47 mi, +846/-997 ft, 2:50:17, 20:06/mi, 18:27 GAP] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/northern-yosemite/#to-the-pct-%5B8.47-mi%2C-%2B846%2F-997-ft%2C-2%3A50%3A17%2C-20%3A06%2Fmi%2C-18%3A27-gap%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next segment is comparatively flat and
pretty early on we passed Peeler Lake, which is spectacular:&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/yosemite-peeler-1.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-peeler-1-small.jpg&quot; alt=&quot;The view of Peeler Lake&quot; /&gt;
&lt;/p&gt;&lt;/a&gt;&lt;p&gt;&lt;/p&gt;
&lt;figcaption&gt;
The view of Peeler Lake
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/yosemite-peeler-selfie.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-peeler-selfie-small.jpg&quot; alt=&quot;A selfie with Peeler Lake in the background&quot; /&gt;&lt;/p&gt;
&lt;/a&gt;
&lt;figcaption&gt;
40+ miles to go but at least it warmed up
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This section is a bit interstitial in that it&#39;s pretty flat but you
know you have some big climbs ahead of you. We would have expected to be
able to move pretty fast on this segment, but due to a combination of the terrain and
trying to take it conservatively we actually slowed down a fair
bit. There were two factors here. First, even though a lot of this
was smooth trail , it was also frequently really narrow single track cut
into a meadow which I found hard to run without hitting my legs
against the side. It was also somewhat gently rolling and were
very deliberately walking anything that went uphill at all.&lt;/p&gt;
&lt;p&gt;It had finally started to warm up to mid 60s and I was starting to be
able to feel my hands again. It never got much warmer than this, and so
we managed to stay pretty well hydrated. We&#39;d started out with
plenty of fluid (2l for me and 2.5l for Chris), with a target of about
500ml/hr, which meant that we had to start filtering water in this segment.
Fortunately, there was water everywhere, whether in lakes or streams,
so from here on in we kept to about 1l each. We only had one filter
(&lt;a href=&quot;https://www.rei.com/product/219378/hydrapak-42-mm-filter-cap?CAWELAID=120217890015692981&amp;amp;cm_mmc=PLA_Google%7C21700000001700551_2193780001%7C92700075508428481%7CNB%7C71700000107444346&amp;amp;gclsrc=3p.ds&amp;amp;gclsrc=ds&amp;amp;gclsrc=ds&quot;&gt;Hydrapak 42mm&lt;/a&gt;), and were both using
sports drink—you have to filter into the bottle and then
add the powder, not the other way around—so the whole filtering/mixing
thing slowed us down.&lt;/p&gt;
&lt;h2 id=&quot;on-the-pct-%5B14.94-mi%2C-%2B3258%2F-3793-ft%2C-5%3A32%3A15%2C-22%3A14%2Fmi%2C-18%3A32%2Fmi-gap%5D&quot;&gt;On the PCT [14.94 mi, +3258/-3793 ft, 5:32:15, 22:14/mi, 18:32/mi GAP] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/northern-yosemite/#on-the-pct-%5B14.94-mi%2C-%2B3258%2F-3793-ft%2C-5%3A32%3A15%2C-22%3A14%2Fmi%2C-18%3A32%2Fmi-gap%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finally we hit the junction for the Pacific Crest Trail. From here
it&#39;s a sharp downhill to the low point of the loop (~7600ft) followed by
two steep climbs, first to around 9500 ft and then above 10000 ft.
The first of these is 1795 ft over 2.77 mi, for an average of 12.3%,
so we made pretty extensive use of our poles.&lt;/p&gt;
&lt;p&gt;I was really starting to feel the altitude and was definitely ready
for the half-way point, which is basically at the dip between the next
two climbs.
We hit the halfway at just under 9 hrs, so it seemed like we were
still on track for a &amp;lt;18 finish, especially as the last 7-10 miles
were downhill.
This is where I was planning to start with the
caffeine and so I sucked down a Maurten Gel CAF 100. These take about
30 min to kick in but after that I started to feel a lot better. From here on,
it was caffeine every 2 hrs to the finish.&lt;/p&gt;
&lt;p&gt;From the pass at about 26 miles, it&#39;s a long downhill to around
30, where we leave the PCT again. This was another one of those
sections where you would have hoped to be moving a lot faster,
but in practice it was all pretty rocky and/or rutted single track
so instead  was a lot of hike/jogging where you&#39;d run a bit and
then have to walk to avoid some rocks, so this turned out to
be a slog. At this point, Chris and I were both really hoping
for the next climb to start, both so we could get it over with
and because I actually find it more fun to go up in this kind
of terrain because you wouldn&#39;t be running anyway, so it&#39;s
not as frustrating that you can&#39;t.&lt;/p&gt;
&lt;p&gt;Chris was also feeling altitude and had a bad headache,
so when we stopped around here to filter water, he took some ibuprofen.
There&#39;s been a &lt;a href=&quot;https://www.irunfar.com/ibuprofen-and-its-effects-during-ultramarathons&quot;&gt;movement away from ibuprofen in ultra&lt;/a&gt;,
but the concern here is mostly about stress on the kidneys,
and with only 20 miles to go in the cold, this didn&#39;t
seem like that big a concern.&lt;/p&gt;
&lt;h2 id=&quot;last-two-climbs-%5B10.21-mi%2C-%2B2864%2F-1234-ft%2C-3%3A48%3A21%2C-22%3A22%2Fmi%2C-18%3A08%2Fmi-gap%5D.&quot;&gt;Last two Climbs [10.21 mi, +2864/-1234 ft, 3:48:21, 22:22/mi, 18:08/mi GAP]. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/northern-yosemite/#last-two-climbs-%5B10.21-mi%2C-%2B2864%2F-1234-ft%2C-3%3A48%3A21%2C-22%3A22%2Fmi%2C-18%3A08%2Fmi-gap%5D.&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We finally hit the bottom and then it was time for the last big push
As you can see from the elevation profile, it&#39;s about 2100 feet over
6.4 miles, but it&#39;s really more like 850 ft over 4.4 miles (very
gentle) and 1300 ft over 2 miles (quite steep), so there was a lot of
hiking up shallow slopes and waiting for the real climbing to begin.&lt;/p&gt;
&lt;p&gt;Throughout the whole approach to the pass, we could see dark clouds
gathering ahead of us. The weather reports had been for some light
rain in the mid afternoon, but that was for Yosemite generally and
of course any weather forecast in the mountains has to be treated
with some skepticism, so we mostly just crossed our fingers and
pushed on. It never really rained on us, but by this time the
sun had started to go down again and we were starting to get cold again.&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/yosemite-clouds-pass.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-clouds-pass-small.jpg&quot; alt=&quot;iDark clouds over the pass&quot; /&gt;
&lt;/p&gt;&lt;/a&gt;&lt;p&gt;&lt;/p&gt;
&lt;figcaption&gt;
Dark clouds over the pass
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/yosemite-clouds-selfie.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-clouds-selfie-small.jpg&quot; alt=&quot;iDark clouds selfie&quot; /&gt;
&lt;/p&gt;&lt;/a&gt;&lt;p&gt;&lt;/p&gt;
&lt;figcaption&gt;
These dudes do not look very happy.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;From the first pass, it&#39;s a steep descent and then
the final climb of 1000+ feet over 2 miles. We&#39;d expected to do some
of this climb on headlamp, but actually we needed light a bit earlier,
towards the end of the descent. I was carrying my ridiculously bright
&lt;a href=&quot;https://www.lupinenorthamerica.com/product-category/lampheads/&quot;&gt;Lupine Neo&lt;/a&gt; (~170 lumens
but still blinding on the second step out of 4), so for
a while it was just me leading the way with Chris not even needing
his own light.&lt;/p&gt;
&lt;p&gt;Climbing in the dark is nice and peaceful, even if you&#39;re also
getting cold and a touch of altitude sickness, and we were still
happy to hit the top of the pass, telling ourselves it was just
an easy cruise in. Of course, it&#39;s really an easy cruise of around
11 miles in the dark, which isn&#39;t actually so easy.&lt;/p&gt;
&lt;h2 id=&quot;back-to-the-start-%5B10.99-mi%2C-%2B591%2F-3609-ft%2C-4%3A10%3A55%2C-22%3A50%2Fmi%2C-22%3A08%2Fmi-gap%5D&quot;&gt;Back to the Start [10.99 mi, +591/-3609 ft, 4:10:55, 22:50/mi, 22:08/mi GAP] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/northern-yosemite/#back-to-the-start-%5B10.99-mi%2C-%2B591%2F-3609-ft%2C-4%3A10%3A55%2C-22%3A50%2Fmi%2C-22%3A08%2Fmi-gap%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Conceptually, this last segment comes in two pieces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Around 4 miles back to the junction of the loop&lt;/li&gt;
&lt;li&gt;The 7 or so miles to the start, which we&#39;d already been on&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With typical runner psychology, our thinking here was that we &amp;quot;just
need to get to the junction&amp;quot; because from there it&#39;s straightforward.
In practice, however, this turned out to be one of the trickiest sections
because (1) it was in the dark (2) the trail was really rocky (3) the trail was faint in places
it was covered in snow. We got off-trail a number of times and had
to spend a long time trying to figure out where it was. The process
here should be familiar:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Watch shows you&#39;re off trail&lt;/li&gt;
&lt;li&gt;Go in some direction to see if you can find it.&lt;/li&gt;
&lt;li&gt;Watch shows you&#39;re going in the wrong direction&lt;/li&gt;
&lt;li&gt;Pull out the phone with the better map (optional)&lt;/li&gt;
&lt;li&gt;Finally spot a section of rock and dirt that looks more heavily used
and head towards it&lt;/li&gt;
&lt;li&gt;Obsessively look at your watch for the next two minutes to see if
you&#39;re really back on trail&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This obviously takes some time and caused us to really slow down in
this segment. This is also where I started to fall off my nutrition;
I had been pretty religiously doing 500ml of either Maurten or
Tailwind + a gel or a bar every hour, but as we got closer to the
finish I started thinking I didn&#39;t need to drink as much and didn&#39;t
want to stop to filter. We had been drinking so much earlier that I was still
well hydrated, but I got behind on my calories a bit. Fortunately with
only a few hours to go I still had some buffer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Finally&lt;/strong&gt;, we hit the trail junction and the last descent. To be honest,
this seemed a lot easier coming up, and we had both remembered it as
being quite smooth and theoretically runnable, but in practice
it wasn&#39;t really that runnable, so there was a lot more hike/jogging than was really
ideal. The last 3 miles or so are genuinely runnable, even on
headlamp, and we did run those, especially the last 1.5 miles,
which are pretty much fire road.&lt;/p&gt;
&lt;p&gt;It wasn&#39;t all smooth sailing, though: about .5 miles out Chris
rolled his ankle on a rough piece of ground/rock/whatever. He walked
it off but then did it again in another 200m or so. At this point
our priority was avoiding injury (he&#39;s racing &lt;a href=&quot;https://www.jfk50mile.org/&quot;&gt;JFK 50&lt;/a&gt;
in less than a month) so we jogged it in nice and easy, at
least until we got back to the campground where—again!—we
saw a bear. Two, actually, a cub and what we assumed was its mother:&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/yosemite-bears.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-bears-small.jpg&quot; alt=&quot;Bears&quot; /&gt;
&lt;/p&gt;&lt;/a&gt;&lt;p&gt;&lt;/p&gt;
&lt;figcaption&gt;
Fortunately, not &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Cocaine_Bear&amp;oldid=1181044002&quot;&gt;cocaine bears&lt;/a&gt;.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We did the usual thing where we gave them space and made a lot of noise
and fortunately they didn&#39;t chase us. From here it&#39;s an easy jog to the
finish, the car, and a five hour drive home to Palo Alto.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/northern-yosemite/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-50-map.png&quot; alt=&quot;Yosemite 50 map&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Map of the course. From Gaia GPS
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-50-profile.png&quot; alt=&quot;Yosemite 50 profile&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Map of the course. From Runalyze
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This was rather harder than I expected. For comparison, when I did
&lt;a href=&quot;https://educatedguesswork.org/posts/tenaya-loop2&quot;&gt;Tenaya&lt;/a&gt; last year I averaged 19:54/mi as opposed
to 21:43/here. I attribute that to several factors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We had to do a lot more of this in the dark. I did Tenaya in midsummer
and it was light almost the whole way. We were at least a minute faster
(20:53) for the first 35 miles and clearly slowed down a lot as it
got darker.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This was much rockier. Tenaya had a lot of smooth downhill sections
(e.g., the run down from Glacier Point) where you could really open
up, but there&#39;s basically nothing like that here.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;With two of us, the filtering takes twice as long. If you have to
filter every 2 hrs and it takes 5 additional minutes to filter, then
that&#39;s almost an additional minute right there.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The main thing I would really change is that I wish
I had brought warmer gloves, because the tips of my fingers were cold
for the last 5 hrs or so. It was never so cold that I was really worried
about damage, but it also wasn&#39;t pleasant. I have a lightweight
pair of waterproof mittens that I wore at &lt;a href=&quot;https://educatedguesswork.org/posts/desolation-wilderness&quot;&gt;Desolation Wilderness&lt;/a&gt;,
and I wished I&#39;d brought those.&lt;/p&gt;
&lt;p&gt;With that said, this was overall a pretty good day. This was really
long but we finished strong and uninjured. I managed my
nutrition well and managed to maintain about 300 cal/hr
except for the last couple hours and I never bonked or
felt thirsty. The altitude got to me a bit but I was able to manage
it OK, even above 10000 ft. Given how I felt going into this, especially
after Teanaway, I&#39;m going to call it a success.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overall:&lt;/strong&gt; 51.8 mi, 9790 ft, 18:44:03, 21:43/mi&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Maybe someday we&#39;ll actually be able to search the Web privately</title>
		<link href="https://educatedguesswork.org/posts/tiptoe/"/>
		<updated>2023-10-02T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/tiptoe/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/private-search-illustration.jpg&quot; alt=&quot;Cover llustration of someone searching&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The privacy of Web search is tragically bad. For those of you who
haven&#39;t thought about it, the way that search works is that your
query (i.e., whatever you typed in the URL bar) is sent to the
search engine, which responds with a &lt;em&gt;search results page (SERP)&lt;/em&gt;
containing the engine&#39;s results. The result
is that the search engine gets to learn everything you search
for. The privacy risks here should be obvious
because people routinely type sensitive queries into their search
engine (e.g., &amp;quot;what is this rash?&amp;quot;,
&amp;quot;&lt;a href=&quot;https://b985.fm/new-englands-most-embarrassing-google-searches/&quot;&gt;Why do I sweat so much&lt;/a&gt;&amp;quot;, or
even &amp;quot;&lt;a href=&quot;https://www.cnn.com/2023/01/18/us/brian-walshe-ana-walshe-google-searches/index.html&quot;&gt;Dismemberment and the best ways to dispose of a body&lt;/a&gt;)&amp;quot;, and you&#39;re really
just trusting the search engine not to reveal your browsing history.&lt;/p&gt;
&lt;p&gt;In addition to learning about your search query itself,
browsers and search engines offer a feature called &amp;quot;search suggestions&amp;quot;
in which the search engine tries to guess what you are looking
for from the beginning of your query. The way this works is that
as you start typing stuff into the search bar, the browser sends
the characters typed so far to the search engine, which responds
with things it thinks you might be interested in searching for.
For instance, if I type the letter &amp;quot;f&amp;quot; into Firefox, this is what
I get:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/search-suggest.png&quot; alt=&quot;Search suggestions in Firefox&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Everything in the red box is a search suggestion from Google.
The stuff below that is from a Firefox-specific mechanism
called &lt;a href=&quot;https://support.mozilla.org/en-US/kb/firefox-suggest-faq&quot;&gt;Firefox Suggest&lt;/a&gt;
which searches your history or—depending on your settings—might
ask Mozilla&#39;s servers for suggestions.
The important thing to realize here is that &lt;em&gt;anything&lt;/em&gt; you type into
the search bar might get sent to the server for autocompletion, which
means that even in situations where you are obviously just typing the
name of a site, as in &amp;quot;facebook&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Your privacy in this setting basically consists of trusting the search
engine; even if the search engine has a relatively good &lt;a href=&quot;https://duckduckgo.com/privacy&quot;&gt;privacy
policy&lt;/a&gt;, this is still an
uncomfortable position.  Note that while Firefox and Safari—but
not Chrome!—have a lot of anti-tracking features, they don&#39;t do
much about this risk because they are oriented towards ad networks
tracking you cross sites, but all of this interaction is with a single
site (e.g., Google.)  There are some mechanisms for protecting your
privacy in this situation—primarily &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/posts/traffic-relaying/&quot;&gt;concealing your IP
address&lt;/a&gt;—but they&#39;re clunky, generally
not available for free, and require trusting some third party to
conceal your identity.&lt;/p&gt;
&lt;p&gt;This situation is well-known to most people who work on browsers—and
to pretty much anyone who thinks about it for a minute—of course
you have to send your search queries to the search engine, if it doesn&#39;t
have your query, it can&#39;t fulfill your request. &lt;a href=&quot;https://tvtropes.org/pmwiki/pmwiki.php/Film/TheCore&quot;&gt;&lt;strong&gt;But what if it could?&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is the question raised by a really cool new &lt;a href=&quot;https://eprint.iacr.org/2023/1438&quot;&gt;paper&lt;/a&gt;
by Henzinger, Dauterman, Corrigan-Gibbs, and Zeldovich about an encrypted
search system called &amp;quot;Tiptoe&amp;quot;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Tiptoe promises fully private (in the sense
that the server learns nothing about what you are searching for) search
for the low low price of 56.9 MiB of communication and 145 core-seconds of
server compute time. Let&#39;s take a look.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-embeddings&quot;&gt;Background: Embeddings &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#background%3A-embeddings&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In order to understand how Tiptoe works, we need some background on
what&#39;s called an
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Word_embedding&amp;amp;oldid=1173409586&quot;&gt;embedding&lt;/a&gt;.
The basic idea behind an embedding is that you can take a piece of
content such as a document or image and convert it into a short(er)
vector of numbers that preserves most of the semantic (meaningful)
properties of the input. They key property here is that two similar
inputs will have similar embedding vectors.&lt;/p&gt;
&lt;p&gt;As an intuition pump, consider what would happen if we were to
simply count the number of times the &lt;a href=&quot;https://www.sketchengine.eu/wp-content/uploads/word-list/english/english-word-list-total.csv&quot;&gt;500 most common English language words&lt;/a&gt; appear in the text. For example, look at this sentence:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I went to the store with my mother&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This contains the following words from the top 500 list (the
numbers in parentheses are the appearance on the list with
0 being the most common):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the(0)&lt;/li&gt;
&lt;li&gt;to(2)&lt;/li&gt;
&lt;li&gt;with(12)&lt;/li&gt;
&lt;li&gt;my(41)&lt;/li&gt;
&lt;li&gt;went(327)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can turn this into a vector of numbers by just making a list
where each entry is the number of times the corresponding word
is present, so in this case it&#39;s a vector of 500 components
(dimension 500), as in:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That&#39;s a lot of zeroes, so let&#39;s stick to the following form which
lists the words that are present:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[the(0) to(2) with(12) my(41) went(327)]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&#39;s consider a few more sentences:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Number&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Sentence&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Embedding&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;I went to the store with my mother&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;[the(0) to(2) with(12) my(41) went(327)]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;I went to the store with my sister&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;[the(0) to(2) with(12) my(41) went(327)]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;I went to the store with your sister&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;[the(0) to(2) with(12) your(23) went(327)]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;4&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;I am going to create the website&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;[the(0) to(2) going(140) am(157) website(321) create(345)]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As you can see, sentences 1 and 2 have exactly the same embedding,
whereas sentence 3 has a similar but not identical embedding, because
I went with &lt;em&gt;your&lt;/em&gt; sister rather than with &lt;em&gt;my&lt;/em&gt; (mother, sister).
This nicely illustrates several key
points about embeddings, namely that (1) similar inputs have
similar embeddings and (2) that embeddings necessarily destroy
some information (technically term: they are &lt;em&gt;lossy&lt;/em&gt;). In this
case, you&#39;ll notice that they have also destroyed the information
about where I went with (your, my) (mother, sister, friend).
By contrast, sentence (4) is a totally different sentence and
has a much smaller overlap, consisting of only the two common
words &amp;quot;the&amp;quot; and &amp;quot;to&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Once we have computed an embedding, we can easily use it to assess how
similar two sentences are. One conventional procedure
(and the one we&#39;ll be using for the rest of this post)
is to instead take what&#39;s called the &lt;a href=&quot;https://en.wikipedia.org/wiki/Dot_product&quot;&gt;inner product&lt;/a&gt;
of the two vectors, which means that you take the sum of the pairwise product of
the corresponding values in each vector (i.e., we multiply component 1 in vector 1 times component 1 in vector 2,
component 2 times component 2, and so on). I.e.,&lt;/p&gt;
&lt;p&gt;$$
P = &#92;sum_i V_1[i] * V_2[i]
$$&lt;/p&gt;
&lt;p&gt;The way this works is that we start by looking at the most common
word (&amp;quot;the&amp;quot;). Each sentence has one &amp;quot;the&amp;quot;, so that component is one
in each vector. We multiply them to get 1.
We then move on to the second most common English word (which happens to be &amp;quot;and&amp;quot;). Neither
sentence has &amp;quot;and&amp;quot;, so in both vectors this is a 0, and 0*0 = 0. Next
we look at the third-most common word (&amp;quot;to&amp;quot;), and so on. We can
draw this like so, for the inner product of S1 and S2.&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{matrix}
the &#92;&#92;
and &#92;&#92;
to  &#92;&#92;
... &#92;&#92;
with &#92;&#92;
... &#92;&#92;
my &#92;&#92;
.... &#92;&#92;
went &#92;&#92;
&#92;end{matrix}
&#92;begin{bmatrix}
1 &#92;&#92;
0 &#92;&#92;
1  &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
&#92;end{bmatrix}
&#92;cdot
&#92;begin{bmatrix}
1 &#92;&#92;
0 &#92;&#92;
1  &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
&#92;end{bmatrix}
= (1 + 1 + 1 + 1 + 1) = 5
$$&lt;/p&gt;
&lt;p&gt;By contrast, if we take S1 and S3 we get:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{matrix}
the &#92;&#92;
and &#92;&#92;
to  &#92;&#92;
... &#92;&#92;
with &#92;&#92;
... &#92;&#92;
your &#92;&#92;
... &#92;&#92;
my &#92;&#92;
.... &#92;&#92;
went &#92;&#92;
&#92;end{matrix}
&#92;begin{bmatrix}
1 &#92;&#92;
0 &#92;&#92;
1  &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
0 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
&#92;end{bmatrix}
&#92;cdot
&#92;begin{bmatrix}
1 &#92;&#92;
0 &#92;&#92;
1  &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
0 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
&#92;end{bmatrix}
= (1 + 1 + 1 + 0 + 0 + 1) = 4
$$&lt;/p&gt;
&lt;p&gt;This value is lower because one sentence has &amp;quot;your&amp;quot;  and the
other has &amp;quot;my&amp;quot; but neither has both &amp;quot;your&amp;quot; and &amp;quot;my&amp;quot;. Finally, if we take S1 and S4, we get:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{matrix}
the &#92;&#92;
and &#92;&#92;
to  &#92;&#92;
... &#92;&#92;
with &#92;&#92;
... &#92;&#92;
my &#92;&#92;
.... &#92;&#92;
went &#92;&#92;
&#92;end{matrix}
&#92;begin{bmatrix}
1 &#92;&#92;
0 &#92;&#92;
1  &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
... &#92;&#92;
1 &#92;&#92;
&#92;end{bmatrix}
&#92;cdot
&#92;begin{bmatrix}
1 &#92;&#92;
0 &#92;&#92;
1  &#92;&#92;
... &#92;&#92;
0 &#92;&#92;
... &#92;&#92;
0 &#92;&#92;
... &#92;&#92;
0 &#92;&#92;
&#92;end{bmatrix}
= (1 + 1 + 0 + 0 + 0) = 3
$$&lt;/p&gt;
&lt;p&gt;What you should be noticing here is that the more similar (the
more words they have in common) the  embedding vectors are, the higher the inner product.
The conventional interpretation is that each embedding vector represents
a &lt;em&gt;d&lt;/em&gt;-dimensional vector where &lt;em&gt;n&lt;/em&gt; is the number of components and that
the closer the angle between the two vectors (the more the point in the
same direction) the more similar they are. Conveniently, the inner product
is equal to the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Sine_and_cosine&amp;amp;id=1175052829&amp;amp;wpFormIdentifier=titleform&quot;&gt;cosine&lt;/a&gt; of the angle, which is 1 when the angle is 0
and 0 when the angle is 90 degrees, and so can be used as a measure
of vector similarity. Personally, I don&#39;t think well in hundreds of dimensions
so I&#39;ve never found this interpretation as helpful as one might like,
but maybe you will find it more intuitive, and it&#39;s good to know anyway.&lt;/p&gt;
&lt;h3 id=&quot;normalization&quot;&gt;Normalization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#normalization&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I&#39;ve cheated a little bit in the way I constructed these sentences,
because using this definition sentences which have more of the
common English words (e.g., longer sentences) will tend to look more similar than those which do not. For instance,
if instead I had used the sentences:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;S5: I have been to the store with my sister&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;S6: I have been to the store with your sister&lt;/p&gt;
&lt;/blockquote&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Number&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Sentence&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Embedding&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;I went to the store with my mother&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;[the(0) to(2) with(12) my(41) went(327)]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;I have been to the store with my sister&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;[the(0) to(2) with(12) have(19) my(41) been(60)]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;I have been to the store with your sister&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;[the(0) to(2) with(12) have(19) your(23) been(60)]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;You&#39;ll notice that sentences 2 and 5 have four words in common (the, to, with, my),
whereas 5 and 6 have five words in common (the, to, with, have, been), even though
they (at least arguably) have quite a different meaning (who I went to the store with)
rather than just differing in grammatical tense (have been versus went).&lt;/p&gt;
&lt;p&gt;The standard way to fix this is to &lt;em&gt;normalize&lt;/em&gt; the vectors so that the
the larger the values of components in aggregate, the less the value of
each individual component matters. For mathematical reasons, this is
done by setting magnitude of the vector (the square root of the
sum of the squares of each component) to 1, which you can do by dividing
each component by the magnitude. When we do this, we
get the following result:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Sentence Pair&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Un-normalized Inner Product&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Normalized Inner Product&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;S2 and S5&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;4&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0.73&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;S2 and S6&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0.55&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This matches our intuition that sentences 2 and 5 are more similar than sentences
2 and 6.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ml-linear-algebra.png&quot; alt=&quot;ML always has been&quot; /&gt;&lt;/p&gt;
&lt;h3 id=&quot;real-world-embeddings&quot;&gt;Real-world Embeddings &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#real-world-embeddings&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Obviously, I&#39;m massively oversimplifying here and in the real world an
embedding would be a lot fancier than just counting common words.
Typically embeddings are computed
using some fancier algorithm like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Word2vec&amp;amp;id=1173932050&amp;amp;wpFormIdentifier=titleform&quot;&gt;Word2vec&lt;/a&gt;,
which itself might use a neural network. However, the cool thing here
is that however you compute the embedding, you can still compute
the similarity of two embeddings in the same way, which means
that you can just build a system that depends on having &lt;em&gt;some&lt;/em&gt; embedding
mechanism and then work out that embedding separately. This is very
convenient for a system like Tiptoe where we can just assume there is
an embedding and work out cryptography that will work generically for
any embedding.&lt;/p&gt;
&lt;h2 id=&quot;tiptoe&quot;&gt;Tiptoe &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#tiptoe&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With this background in mind, we are ready to take a look at Tiptoe.&lt;/p&gt;
&lt;h3 id=&quot;naive-embedding-based-search&quot;&gt;Naive Embedding Based Search &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#naive-embedding-based-search&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Let&#39;s start by looking at how you could use embeddings to build a search
engine. The basic intuition here is simple. You have a corpus of documents (e.g., Web pages)
$D_1, D_2 ... D_n$. For each document, you compute a corresponding
embedding for the document $Embed(D_1), Embed(D_2), ... Embed(D_n)$. When the user sends
in their search query $Q$ you compute $Embed(Q)$ and return the document(s)
that are closest to $Embed(Q)$, which is to say have the highest inner products.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Naively, you just compute the inner product of the embedded query against every
document embedding and then take the top values, though of course there
are more efficient algorithms.&lt;/p&gt;
&lt;p&gt;The figure below shows a trivial example. In this case, the client&#39;s
embedded query is most similar to $Embed(D_4)$, and so the server sends $D_4$
(or, in the case of search, its URL) in response.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/EmbeddingSearch.png&quot; alt=&quot;Example of search with embeddings&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This is actually a very simplified version of how modern systems such
as &lt;a href=&quot;https://arxiv.org/pdf/2004.12832.pdf&quot;&gt;ColBERT&lt;/a&gt; work.&lt;/p&gt;
&lt;p&gt;Of course the problem with this system is the same as the problem we started
with, because you have to send your query to the server so it can compute
the embedding. There are two obvious ways to address this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Compute the embedding on the client and send it to the server.&lt;/li&gt;
&lt;li&gt;Send the entire database to the client&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first of these doesn&#39;t work because the embedding contains lots of
information about the query (otherwise the search engine couldn&#39;t
do its job). The second doesn&#39;t work because the embedding database
is far too big to send to the client. What we need is a way to do this
same computation on the server without sending the client&#39;s cleartext
query or its embedding to the server.&lt;/p&gt;
&lt;h3 id=&quot;naive-tiptoe%3A-inner-products-with-homomorphic-encryption&quot;&gt;Naive Tiptoe: Inner Products with Homomorphic Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#naive-tiptoe%3A-inner-products-with-homomorphic-encryption&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Tiptoe addresses this problem by splitting it up into two pieces.
First, the client uses a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Homomorphic_encryption&amp;amp;oldid=1085790826&quot;&gt;homomorphic encryption&lt;/a&gt;
encryption system to get the server to compute the inner product for
each document without allowing the server to see the query.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Tiptoe-diagram.png&quot; alt=&quot;Tiptoe private ranking&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The client then ranks each results by its inner product, which gives
it a list of the results that are most relevant (e.g., results &lt;code&gt;1, 3, 9&lt;/code&gt;).
The indices themselves aren&#39;t useful: the client needs the URL
for each result, so it uses a &lt;a href=&quot;https://educatedguesswork.org/posts/pir&quot;&gt;Private Information Retrieval (PIR)&lt;/a&gt; scheme to retrieve
the URLs associated with the top results from the server.&lt;/p&gt;
&lt;p&gt;The reason for this two-stage design is that the URLs themselves
are fairly large, and so having the server provide the URL
for each result is inefficient, as most of the results will
be ranked low and so the user will never see them.
The server can also embed the type of preview metainformation
that typically appears on the SERP (e.g., a text snippet)
if it wanted to, but because PIR is
expensive, you want the results to be as small as possible.
Once the client has the URLs, the it can just go directly to
whichever site the user selects.&lt;/p&gt;
&lt;p&gt;I already explained PIR in a previous
&lt;a href=&quot;https://educatedguesswork.org/posts/pir&quot;&gt;post&lt;/a&gt;, so this post will just focus on the ranking
system. This system uses some similar concepts to PIR, so you
may also want to go review that post.
You may recall from that &lt;a href=&quot;https://educatedguesswork.org/posts/pir&quot;&gt;post&lt;/a&gt; that a homomorphic
encryption scheme is one in which you can operate on encrypted data.
Specifically, if you have two plaintext messages $M_1$ and $M_2$ and
their corresponding ciphertexts $E(M_1)$ and $E(M_2)$ then the
encryption is homomorphic with respect to a function $F$ if&lt;/p&gt;
&lt;p&gt;$$
F(E(M_1), E(M_2)) = E(F(M_1, M_2))
$$&lt;/p&gt;
&lt;p&gt;So, for instance, if you were to have an encryption function which is
homomorphic with respect to addition, that would mean you could add
up the ciphertexts and the result would be the encryption of the
sum of the plaintexts. I.e.,&lt;/p&gt;
&lt;p&gt;$$
E(A) + E(B) = E(A + B)
$$&lt;/p&gt;
&lt;p&gt;Homomorphic encryption allows you to give
some encrypted values to another party, have it operate on them
and give you the result, and then you can decrypt it to get the
same result as if they had just operated on the plaintext values,
but without them learning anything about the values they are operating
on.&lt;/p&gt;
&lt;p&gt;We can apply homomorphic encryption to this problem as follows. First,
the client computes the embedding of the query
giving it an embedding vector $V$ and each element of it $i$,
$V_i$. The client then encrypts each element of $V$ with a homomorphic
encryption system. Call this $E(V)$ and each element $E(V_i)$.
The client sends $E(V)$ to the server.&lt;/p&gt;
&lt;p&gt;The server iterates over each URL $U_j$ and its corresponding embedding
value $D_j$ and computes the inner product of $D_j$ and $E(V)$. Specifically,
for each element $i$, it computes the pairwise product $I_{j, i}$:&lt;/p&gt;
&lt;p&gt;$$
E(I_{j,i}) = D_{j,i} * E(V_i)
$$&lt;/p&gt;
&lt;p&gt;It then sums up all these values, to get the encrypted inner product for URL $j$.&lt;/p&gt;
&lt;p&gt;$$
E(I_j) = &#92;sum_i E(IP_{j,i}) = &#92;begin{matrix}E(V_1 * D_1) &#92;&#92;
+ &#92;&#92;
E(V_2 * D_2) &#92;&#92;
+ &#92;&#92;&lt;br /&gt;
E(V_3 * D_3) &#92;&#92;
+ &#92;&#92;&lt;br /&gt;
E(V_4 * D_4) &#92;&#92;
+ &#92;&#92;
E(V_5 * D_5)
&#92;end{matrix}
$$&lt;/p&gt;
&lt;p&gt;Written in pseudo-matrix notation, we get:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
E(V_1) &#92;&#92;
E(V_2) &#92;&#92;
E(V_3) &#92;&#92;
E(V_4) &#92;&#92;
E(V_5) &#92;&#92;
&#92;end{bmatrix}
&#92;cdot
&#92;begin{bmatrix}
D_1 &#92;&#92;
D_2 &#92;&#92;
D_3 &#92;&#92;
D_4 &#92;&#92;
D_5 &#92;&#92;
&#92;end{bmatrix}
&#92;rightarrow
&#92;begin{bmatrix}
E(V_1 * D_1) &#92;&#92;
E(V_2 * D_2) &#92;&#92;
E(V_3 * D_3) &#92;&#92;
E(V_4 * D_4) &#92;&#92;
E(V_5 * D_5) &#92;&#92;
&#92;end{bmatrix}
&#92;rightarrow
&#92;sum_i E(V_i * D_i)
$$&lt;/p&gt;
&lt;p&gt;The server then sends back the encrypted inner product values to the
client (one per document in the corpus). The client decrypts them to
recover the inner product values (again, one per document). It can
then just pick the highest ones which are the best matches and
retrieve their URLs via PIR (effectively, &amp;quot;give me the URLs for
documents 1, 3, 9&amp;quot;, etc.). It then dereferences the URLs as normal.
Because this is all done under encryption, the server never learns
your search query, the matching documents, or the URLs you eventually
decide to dereference (though of course those servers see when
you visit them). Importantly, these guarantees are cryptographic, so you don&#39;t have to
trust the server or anyone else not to cheat. This is different
form proxying systems, where the proxy and the server can collude
to link up your searches and your identity.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;ciphertext-size-matters&quot;&gt;Ciphertext Size Matters &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#ciphertext-size-matters&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If each value in the embedding vector is a 32-bit floating point number
and the embedding vector has dimension 700ish, then the embedding
values for each document is around 2800 bits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If we naively use &lt;a href=&quot;https://educatedguesswork.org/posts/pir/#detail%3A-homomorphic-encryption-using-elgamal&quot;&gt;ElGamal encryption&lt;/a&gt;,
then each ciphertext will be around 64 bytes (480 bits).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is an improvement of a factor of 7 or so, but at the cost
of doing $N$ encryption operations, which is quite a lot.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;clustering&quot;&gt;Clustering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#clustering&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Let&#39;s take stock of where we are now. The client sends a relatively
short value, consisting of $T$ ciphertexts where $T$ is the number
elements in the embedding vector. The server responds with $N$
ciphertexts, where $N$ is the number of URLs in its corpus and has
to do $T*N$ multiplications. Depending on the homomorphic encryption algorithm, this might or
might not be an improvement on the total communication bandwidth,
but it&#39;s still linear in the number of documents, which is quite
bad.&lt;/p&gt;
&lt;p&gt;It&#39;s not really possible to reduce the number of operations on the
server below linear. The reason for this is that the server
needs to operate on the embedding for each document; otherwise
the server could determine which embeddings the client &lt;em&gt;isn&#39;t&lt;/em&gt;
interested in by which ones it doesn&#39;t have to look at it
in order to satisfy the client&#39;s query. However, it &lt;em&gt;is&lt;/em&gt; possible
to significantly improve the amount of bandwidth consumed by the
server&#39;s response.&lt;/p&gt;
&lt;p&gt;The trick here is that the server breaks up the corpus of documents
into clusters of approximately $&#92;sqrt N$ size (hence there are approximately
$&#92;sqrt N$ clusters). These clusters are arranged so that they
have nearby embedding vectors, and hence the documents are
are theoretically similar. The server publishes the embedding vector for
the center of the cluster, and this allows the client to request
&lt;em&gt;only&lt;/em&gt; the inner products for the closest cluster. This reduces
the amount of data that the server by a factor of $&#92;sqrt N$ to order
$&#92;sqrt N$. There&#39;s just one problem: if the client only queries one cluster, then
doesn&#39;t the server know which cluster the client is interested in?&lt;/p&gt;
&lt;p&gt;We fix this by having the client send a separate encrypted query for
each cluster, like so:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
E(0) &amp;amp; &#92;color{red}{E(V_1)} &amp;amp; E(0)  &#92;&#92;
E(0) &amp;amp; &#92;color{red}{E(V_2)} &amp;amp; E(0)  &#92;&#92;
E(0) &amp;amp; &#92;color{red}{E(V_3)} &amp;amp; E(0)  &#92;&#92;
E(0) &amp;amp; &#92;color{red}{E(V_4)} &amp;amp; E(0)  &#92;&#92;
E(0) &amp;amp; &#92;color{red}{E(V_5)} &amp;amp; E(0)  &#92;&#92;
&#92;end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;In this diagram, each column represents one cluster (and hence there are
$&#92;sqrt N$ columns), and each row is
a different embedding component. The column corresponding to the cluster
(column $q$) of interest (in red) contains the encryption of the client&#39;s actual
query embedding vector, whereas the rest of the columns just contain
the encryption of 0 (the encryption is randomized so that they are
not readily identifiable).&lt;/p&gt;
&lt;p&gt;The server takes each column of the client&#39;s query and computes the
inner product for each document the corresponding cluster, as before.
I.e., for document $j$ in cluster $c$, it computes $E(I_{c, j})$.
&lt;em&gt;Then&lt;/em&gt;, however, the server adds up the
inner product values across the clusters, with one report for the
the sum of the values for 1st URL in each cluster, one for the
the sum of the inner products for the 2nd URL, and the cluster,
and so on, so that the server still only returns the same
number of of ciphertexts as before. I.e., it reports:&lt;/p&gt;
&lt;p&gt;$$
E(I_j) = &#92;sum_c E(I_{c, j})
$$&lt;/p&gt;
&lt;p&gt;Ordinarily the sum of these would be useless, but
the trick&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
here is that because the other columns—corresponding
to the cluster that is not of interest—are the encryption of
0, their inner products are &lt;em&gt;also&lt;/em&gt; zero, which means that the
result sent back to the client only includes the inner products
for the column of interest (column $q$).&lt;/p&gt;
&lt;p&gt;The resulting scheme has much better communication overhead:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The server sends the list of the centers of the embeddings ($&#92;sqrt N$).&lt;/li&gt;
&lt;li&gt;The client sends a list of $d$ encrypted components for each cluster
($d &#92;sqrt N$).&lt;/li&gt;
&lt;li&gt;The server sends a single encrypted inner product value for each
document in the cluster ($&#92;sqrt N$).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is dramatically better than the naive scheme in which the client
sends $d$ values and the server sends $N$, although at the cost of
pushing some of the transmission cost onto the client, for a total
transmission that scales as a factor of $(d+2)&#92;sqrt N$. Of course,
that&#39;s still pretty big and the constant factor is &lt;em&gt;also&lt;/em&gt; pretty big
(~512 bits per document for ElGamal). The Tiptoe paper uses some clever
tricks to bring the size down some  (see &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#cost&quot;&gt;below&lt;/a&gt; for cost numbers) but the end result is still fairly large
(see &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#cost&quot;&gt;cost&lt;/a&gt; below).&lt;/p&gt;
&lt;h2 id=&quot;performance&quot;&gt;Performance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#performance&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As should be clear from the previous section it&#39;s &lt;em&gt;possible&lt;/em&gt; to build
privacy-preserving search, but how well does it actually do? This
actually comes down to two questions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;How good are the answers?&lt;/li&gt;
&lt;li&gt;How much does it cost?&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;accuracy&quot;&gt;Accuracy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#accuracy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;First, let&#39;s take a look at accuracy. Obviously, a private search
mechanism will be no better than a non private search mechanism,
because if it were you could just remove the privacy pieces and
get the same accuracy. However, realistically we should expect worse
accuracy, just on general principle (i.e., we are hiding information
from the server). In this specific case we
should expect worse accuracy because the server is just operating
on the (encrypted) embedding of the query, rather than the whole
query, and computing the embedding destroys some information.&lt;/p&gt;
&lt;p&gt;The metric the authors use for performance is something called &amp;quot;MRR@100&amp;quot;, which stands
for &amp;quot;mean reciprocal rank at 100&amp;quot;. The way this works is that for each
query you determine which result people would have ranked at number
1 and then ask what position the search algorithm returned it in.
You then compute a score that is the inverse of that position,
so, for instance, if the document were found in position 5, then the
score would be $1/5$. The &amp;quot;mean&amp;quot; part is that you average out the
results over the document corpus. The &amp;quot;at 100&amp;quot; part is that if the
search algorithm doesn&#39;t return the result in the top 100 values,
you get a score of zero. In other words:&lt;/p&gt;
&lt;p&gt;$$
MRR =
&#92;frac{&#92;sum_i^N
&#92;begin{cases}
&#92;frac{1}{Rank_i} &amp;amp; &#92;text{if } R_i &#92;leq 100 &#92;&#92;
0 &amp;amp;&#92;text{otherwise}
&#92;end{cases}
}
{N}
$$&lt;/p&gt;
&lt;p&gt;Note that this score really rewards getting the top result, because even
getting it in second place only gets you a per-document score of $1/2$.&lt;/p&gt;
&lt;p&gt;The results look like this:&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/mrr-scores-tiptoe.png&quot; alt=&quot;Tiptoe MRR Results&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Source: Tiptoe paper.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The graph on the left provides MRR@100 comparisons to a number of
algorithms, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A modern search algorithm (&lt;a href=&quot;https://arxiv.org/pdf/2004.12832.pdf&quot;&gt;ColBERT&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Two somewhat older systems (BM25 and tf-idf)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As you can see, ColBERT performs the best and Tiptoe gets pretty close
to tf-idf but is still significantly worse than BM-25. The &amp;quot;embeddings&amp;quot;
is the result if you don&#39;t use the clustering trick described &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#clustering&quot;&gt;above&lt;/a&gt;.
Notice here that &amp;quot;embeddings&amp;quot; does very well, and in fact is better than
BM25, so the clustering really does have quite a significant impact on
the quality of the results.&lt;/p&gt;
&lt;p&gt;The graph on the right shows the cumulative probability that the
best result will be found at an index less than $i$ (i.e., that
it is found in the top $i$ results). The dotted line shows the
chance that the best result is in the cluster Tiptoe receives at
all; which reflects the best result Tiptoe could deliver even if
it always picked the best result out of the cluster (about 1/3 of the
time).&lt;/p&gt;
&lt;p&gt;On the one hand, this is a fairly large regression from the state of the
art, but on the other hand, it means that there is a lot of room for
improvement just by improving the clustering algorithm on the server.
Obviously, there&#39;s also room for improvement in terms of ranking within
the cluster. With the current design the client just gets the
inner product so all it can do is rank them, but there might be some
things you could do, such as proactively retrieving the first 10 documents
or so (there is a very steep improvement curve within the first 10)
and running some local ranking algorithm on their content.&lt;/p&gt;
&lt;h3 id=&quot;cost&quot;&gt;Cost &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#cost&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;So how much will all this cost. The answer is &amp;quot;quite a bit
but not as much as you would think&amp;quot;. Here&#39;s Figure 8, which
shows the estimated cost of Tiptoe for various document sizes:&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tiptoe-cost.png&quot; alt=&quot;The cost of Tiptoe&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
Source: Tiptoe paper.
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The server CPU cost is linear in the number of documents in the corpus
and would require around 1500 core seconds for something like Google.&lt;/p&gt;
&lt;p&gt;The communication cost is sublinear in the number of
documents but has a very high fixed cost of around 55
MiB for a query on a corpus the size of
the Common Crawl data set (~360 million documents)
and around 125 MiB for a Google sized system (~8 billion documents).
Tiptoe uses a number of tricks to frontload this
cost; most of the communication isn&#39;t dependent on the
query, so that the client and server can exchange it
in advance without it being in the critical path.
The server also has to send the client the
embedding algorithm, which can be quite large (e.g,
200+ MiB) but that is reused for multiple queries and
so can be amortized out.&lt;/p&gt;
&lt;p&gt;Using Amazon&#39;s list price costs, the overall cost is around 0.10 USD/query
for a system the size of Google. Google doesn&#39;t publish their numbers
but 9to5Google estimates it at &lt;a href=&quot;https://9to5google.com/2023/02/23/google-bard-ai-cost-report/#:~:text=An%20estimate%20by%20Morgan%20Stanley%20pins%20down%20a,but%20the%20number%20would%20skyrocket%20when%20using%20AI.&quot;&gt;.002 USD/query&lt;/a&gt;.
This is 50 times less, which is a big difference, but actually that
probably overstates the difference because Google isn&#39;t paying list
price for their compute costs, so the difference is probably
quite a bit less. In either case, this is actually a smaller difference
than you would expect given the enormous improvement in privacy.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The lesson you should be taking home here is &lt;em&gt;not&lt;/em&gt; that Tiptoe is
going to replace Google search tomorrow. Not only are private search
techniques like this more expensive than non-private techniques, they
are inherently less flexible. Google&#39;s SERP is a lot more than just
a list of results. For instance here&#39;s the top of the search page for
&amp;quot;tiptoe&amp;quot;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tiptoe-serp.png&quot; alt=&quot;Tiptoe SERP&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Note that the first entry is actually a dictionary definition, the info-box
on the right, and the alternate questions. The first website result
is below all that. Obviously, one could imagine
enhancing a system like Tiptoe to provide at least some of these
features, though at yet more cost.&lt;/p&gt;
&lt;p&gt;There are two stories here that are true at the same time. The
first is about technical capabilities: in most cases, private systems are inherently less flexible and
powerful than their non-flexible counterparts. It&#39;s always easier
to just tell the server everything and let it sort out what to do,
both because the server can just unilaterally add new features
without any help from the client and because it&#39;s often difficult
to figure out how to provide a feature privately (just look at all
the fancy cryptography that&#39;s required to provide a simple list
of prioritized URLs). This will almost always be true, with the
only real exception being cases where the data is so sensitive
that it&#39;s simply unacceptable to send it to the server at all, and
so private mechanisms are the only way to go. However, I think the lesson
of the past 20 years is that people are actually quite willing to
tell their deepest secrets to some computer, so those cases are quite
rare.&lt;/p&gt;
&lt;p&gt;The other story is about path dependence. Google search didn&#39;t get
this fancy at all once; the original search page was much simpler
(basically a list of URLs with a snippet from the page) and features
were added over time. If we imagine a world in which privacy had been
prioritized right from the start, then we would have a much richer
private search ecosystem—though most likely not as powerful as
the one we have now. The entry barrier to increased data collection
for slightly better features would most likely be a lot higher than it
is today. But because we started out with a design that wasn&#39;t private,
it led us naturally to where we are today, where every keystroke you
type in the URL/search bar just gets fed to the search provider.&lt;/p&gt;
&lt;p&gt;I&#39;m not under any illusions that it will be easy to reverse course here:
even in the much simpler situation of protecting your Web traffic
in transit, it&#39;s taken decades to get out from under the weight of
the early decisions to do almost everything in the clear and we&#39;re
still not completely done. Moreover, that was a situation where we had the technology
to do it for a long time, and it was just a matter of deployment and
cost. However, the first step to actually changing things is knowing
how to do it, and so it&#39;s really exciting to see people taking up
the challenge.&lt;/p&gt;
&lt;h2 id=&quot;acknowledgement&quot;&gt;Acknowledgement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tiptoe/#acknowledgement&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Thanks to Henry &lt;a href=&quot;https://people.csail.mit.edu/henrycg/&quot;&gt;Corrigan-Gibbs&lt;/a&gt; for assistance with this post. All mistakes are of course mine.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Firefox, at least, does make some
attempt to omit &lt;em&gt;pure&lt;/em&gt; navigational queries, so if you type &amp;quot;http://&amp;quot;
in the Firefox search box, this gets sent to the server, but
&amp;quot;&lt;a href=&quot;http://f/&quot;&gt;http://f&lt;/a&gt;&amp;quot; does not. &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Disclosure: this work was partially funded by a grant from
Mozilla, in a program operated by my department. &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In a real-world example, one might well prune out these
common not-very-meaningful words. &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that you might use a different algorithm to compute the embeddings
on the documents as on the queries, for instance if you are doing
text search over images. For the purposes of this post, however,
this is not important. &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note that this is basically
the same trick that PIR schemes use. &lt;a href=&quot;https://educatedguesswork.org/posts/tiptoe/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Desolation Wilderness Seven^H^H^H^H^HTwo Summits</title>
		<link href="https://educatedguesswork.org/posts/desolation-wilderness/"/>
		<updated>2023-09-05T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/desolation-wilderness/</id>
		<content type="html">&lt;a href=&quot;https://educatedguesswork.org/img/desolation-cover.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/desolation-cover-small.jpg&quot; alt=&quot;View from Pyramid Peak&quot; /&gt;&lt;/p&gt;
&lt;/a&gt;
&lt;p&gt;My two races this season were to be &lt;a href=&quot;https://educatedguesswork.org/posts/broken-arrow&quot;&gt;Broken Arrow
Skyrace&lt;/a&gt; and then a hundred to be named
later. I&#39;d originally planned to do Whistler Alpine Meadows 100 but
then it was cancelled in February and I spent a long time
procrastinating but finally settled on &lt;a href=&quot;https://teanawaycountry100.com/&quot;&gt;Teanaway Country
100&lt;/a&gt;.  Teanaway is about the opposite
of &lt;a href=&quot;https://educatedguesswork.org/posts/utmb&quot;&gt;UTMB&lt;/a&gt;: a tiny low-key race (59 entrants so far), but
with pretty similar topline stats, with 32000 feet over 100 miles.&lt;/p&gt;
&lt;p&gt;I&#39;ve had several solid training blocks this year, but I wanted to try
to get in one more adventure run this summer. Unfortunately, due to
last winter&#39;s ridiculous snow season, most of the routes I was
interested in doing in the Sierra were snowed in in midsummer, so I
didn&#39;t start looking seriously till a few weeks ago, eventually
deciding to take a crack at the &lt;a href=&quot;https://pantilat.wordpress.com/2013/08/15/desolation-seven-summits/&quot;&gt;Desolation Wilderness Seven Summits
Loop&lt;/a&gt;,
which I first saw on Leor Pantilat&#39;s fantastic &lt;a href=&quot;https://pantilat.wordpress.com/&quot;&gt;site&lt;/a&gt;.
As the name suggests, this route covers the seven named summits in the Desolation
Wilderness. Technically speaking, the &lt;a href=&quot;https://fastestknowntime.com/route/desolation-7-summits-ca&quot;&gt;fastest known time&lt;/a&gt; for this
is just to hit the peaks however, but there&#39;s a common loop linked
above. The
loop is 29 miles long with 10000+ ft of climbing including a fair
amount of off-trail terrain, so I figured it would be a nice scaled
down warmup for Teanaway. I&#39;d actually intended to do a slightly longer variant of
about 40 miles/15kft the week of August 13, but then I got sick
and so had to defer to last weekend, and with only two weeks to
Teanaway, decided to stick to the normal version.&lt;/p&gt;
&lt;h2 id=&quot;logistics&quot;&gt;Logistics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#logistics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This loop starts at a parking lot off US 50 en route to Tahoe a bit
East of Kyburz. I stayed at the &lt;a href=&quot;https://www.tripadvisor.com/Hotel_Review-g32571-d1511799-Reviews-Sierra_Inn_On_the_River-Kyburz_California.html&quot;&gt;Sierra Inn On the
River&lt;/a&gt;,
which is conveniently situated about 15 minutes away. I was planning
to start at about 5:30-6 AM (sunrise is at about 6:40), so I was able
to sleep in till 4:30 and then drive over.&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/desolation-prep.jpg&quot;&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/desolation-prep-small.jpg&quot; alt=&quot;My stuff for the event&quot; /&gt;
&lt;/a&gt;
&lt;figcaption&gt;
&lt;p&gt;My stuff laid out for the next day. I ended up not bringing the remote control.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Desolation Wilderness requires permits which are self-issued at the trailhead—overnight
stays require a separate permit—but even though the parking lot is at an official trailhead,
I was unpleasantly surprised to see that there wasn&#39;t any kind of kiosk either at this
trailhead or on the trailhead on the other side of the highway. This actually
isn&#39;t the trailhead you enter the Wilderness from; instead you run down the
highway for a few miles, so I figured I&#39;d just head out and hope there
was a kiosk at the other trailhead.&lt;/p&gt;
&lt;h2 id=&quot;start-to-trailhead-%5B3.3-mi%2C-%2B249ft%2F-774ft%5D&quot;&gt;Start to Trailhead [3.3 mi, +249ft/-774ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#start-to-trailhead-%5B3.3-mi%2C-%2B249ft%2F-774ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first two miles or so is downhill on 50, and even though it was starting to get
lighter, I did this on headlamp (Petzl Actik Core), both to make sure of my
own footing and for visibility. I took this pretty easy at 8:15/mile so I
could warm up.&lt;/p&gt;
&lt;p&gt;There was no bathroom at the start, and predictably I&#39;d only run a mile or so before
I really needed to go. Fortunately, the Pyramid Creek trailhead is right along the
highway and has flush toilets. They also have a pay parking lot but still no
place to issue your own permit. I walked to the start of the trailhead and found
a sign saying that there was permit issuance at the Wilderness boundary about a quarter
mile up, so I went down the trail a bit hoping to find it, but despite going
past the sign for the boundary and up to the top of a little ridge, I never found
it and just gave up and headed back down the road. I did manage to lose my sunglasses,
though, not, as it turned out, that I needed them.&lt;/p&gt;
&lt;h2 id=&quot;pyramid-peak-trail-10.33-%5B7.03-mi%2C-%2B4262ft%2F-4196ft%5D&quot;&gt;Pyramid Peak Trail 10.33 [7.03 mi, +4262ft/-4196ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#pyramid-peak-trail-10.33-%5B7.03-mi%2C-%2B4262ft%2F-4196ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This route involves a climb to the top of Pyramid Peak followed by a bunch
of traversing of the high country, tagging the rest of the peaks, and then
a descent to the bottom.
The Pyramid Peak Trail doesn&#39;t actually have a real official trailhead, so much
as a small parking area across the highway from a cut-out in the embankment.
There were two cars there already and apparently it gets full later, but as I was
on foot, it wasn&#39;t a problem for me.&lt;/p&gt;
&lt;p&gt;The first summit is a monster climb right from the start, ascending almost 4000
feet in 3.3 miles. I didn&#39;t even bother to try to run any of it, but just
pulled out my poles and started hiking. This is kind of an unofficial trail and
isn&#39;t really marked but is in OK shape and so I was mostly just able to follow
the tread pattern, occasionally checking the GPS to make sure I was on the right
route.&lt;/p&gt;
&lt;p&gt;The footing is pretty reasonable but it&#39;s still slow going because it&#39;s so
steep. It also was starting to get windy so I decided to throw on my
rain jacket. I have the &lt;a href=&quot;https://www.inov-8.com/ca/raceshell-half-zip-featherlight-waterproof-running-jacket&quot;&gt;Inov-8 Raceshell half-zip&lt;/a&gt;
and I bought a size up with the idea that I could put it on over my pack
so that you can get it on and off quickly, but this works a lot better in
theory than practice, as it&#39;s a pullover and gets caught on the bulge
of the pack, so I fought with it for a few minutes and then finally
just took my pack off. The jacket is comfortable and breathes well,
though.&lt;/p&gt;
&lt;p&gt;Eventually, the trail just kind of ends and you get to the final 500ft
or so of climb, which are just one giant talus pyramid. I forgot to take
a photo here, but this &lt;a href=&quot;https://images.alltrails.com/eyJidWNrZXQiOiJhc3NldHMuYWxsdHJhaWxzLmNvbSIsImtleSI6InVwbG9hZHMvcGhvdG8vaW1hZ2UvNjQ2MTM4MDUvNWFmMmM0MDZiOTBmNmYxMjlhNzllYzM2MDNmOWZjNTUuanBnIiwiZWRpdHMiOnsidG9Gb3JtYXQiOiJqcGVnIiwicmVzaXplIjp7IndpZHRoIjoyMDQ4LCJoZWlnaHQiOjIwNDgsImZpdCI6Imluc2lkZSJ9LCJyb3RhdGUiOm51bGwsImpwZWciOnsidHJlbGxpc1F1YW50aXNhdGlvbiI6dHJ1ZSwib3ZlcnNob290RGVyaW5naW5nIjp0cnVlLCJvcHRpbWlzZVNjYW5zIjp0cnVlLCJxdWFudGlzYXRpb25UYWJsZSI6M319fQ==&quot;&gt;shot&lt;/a&gt; gives the
idea:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img src=&quot;https://images.alltrails.com/eyJidWNrZXQiOiJhc3NldHMuYWxsdHJhaWxzLmNvbSIsImtleSI6InVwbG9hZHMvcGhvdG8vaW1hZ2UvNjQ2MTM4MDUvNWFmMmM0MDZiOTBmNmYxMjlhNzllYzM2MDNmOWZjNTUuanBnIiwiZWRpdHMiOnsidG9Gb3JtYXQiOiJqcGVnIiwicmVzaXplIjp7IndpZHRoIjoyMDQ4LCJoZWlnaHQiOjIwNDgsImZpdCI6Imluc2lkZSJ9LCJyb3RhdGUiOm51bGwsImpwZWciOnsidHJlbGxpc1F1YW50aXNhdGlvbiI6dHJ1ZSwib3ZlcnNob290RGVyaW5naW5nIjp0cnVlLCJvcHRpbWlzZVNjYW5zIjp0cnVlLCJxdWFudGlzYXRpb25UYWJsZSI6M319fQ==&quot; alt=&quot;Alltrails photo of pyramid peak&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;[Source: &lt;a href=&quot;https://www.alltrails.com/explore/recording/afternoon-hike-at-pyramid-peak-trail-88e1ce8&quot;&gt;Charles Jenkins&lt;/a&gt;]&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;There didn&#39;t seem to be any obvious trail up to the top, so I just started
to scramble up. As I was doing so, I saw what looked like a runner at the
top starting to come down and then I ran into two hikers. They told me that
it was really windy at the top (it was already quite windy where I was) and
that it was safer to stay towards the right (the way they had come down).
I followed their advice and sure enough it started to get quite bad to
the point where I wasn&#39;t that comfortable just standing up and had
to use my hands more than usual. This last 500 feet of climbing and maybe
a half mile probably took me like 30+ minutes and I almost turned back
once because it was so sketchy.&lt;/p&gt;
&lt;p&gt;I finally made it to the top and found somewhere that was a little sheltered
and managed to take some photos.  I didn&#39;t
really want to stand too much on the rock ledges surrounding the hollows
people had opened up at the top (presumably for shelter), and it wasn&#39;t
really that clear, but there are still some great views.&lt;/p&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/desolation-pyramid1.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/desolation-pyramid1-small.jpg&quot; alt=&quot;Pyramid Peak view&quot; /&gt;&lt;/p&gt;
&lt;/a&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/desolation-pyramid2.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/desolation-pyramid2-small.jpg&quot; alt=&quot;Pyramid Peak view&quot; /&gt;&lt;/p&gt;
&lt;/a&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/desolation-pyramid3.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/desolation-pyramid3-small.jpg&quot; alt=&quot;Pyramid Peak view&quot; /&gt;&lt;/p&gt;
&lt;/a&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/desolation-pyramid4.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/desolation-pyramid4-small.jpg&quot; alt=&quot;Pyramid Peak view&quot; /&gt;&lt;/p&gt;
&lt;/a&gt;
&lt;p&gt;This last one really lets you see the rock slope you have to descend. Sketchy!&lt;/p&gt;
&lt;p&gt;At this point, you&#39;re supposed to head down the back side of Pyramid Peak
and head offtrail to Aggasiz Peak, but when I looked down it was pretty
unclear where the trail was and I really wasn&#39;t thrilled about the idea of
being exposed to that much wind for the next 10 or so miles, so I made
the—in retrospect correct—decision to turn back.&lt;/p&gt;
&lt;p&gt;As is commonly the case, coming down that rockpile was actually worse
than going up: you&#39;ve got gravity trying to pull you down and because
you&#39;re facing forward, you can&#39;t really use your hands, so I slipped and fell
on my ass a bunch of times. Because I was trying to stay out of the wind
I veered way off course and ended up kind of skirting the edge of the peak
and then had to bushwhack my way back to the trail. From there
it was a pretty straightforward descent to the bottom and I was able
to run a fair bit of it.&lt;/p&gt;
&lt;h2 id=&quot;back-to-the-car-12.24-%5B1.91-mi%2C-%2B443ft%2F-39ft%5D&quot;&gt;Back to the Car 12.24 [1.91 mi, +443ft/-39ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#back-to-the-car-12.24-%5B1.91-mi%2C-%2B443ft%2F-39ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;From the bottom, I needed to climb another 500 feet or so on
50 to get back to the car, which gave me some time to regroup. At this
point I was about 11 miles (though 4000+ ft) and 5 hrs in, so I had
plenty of time and even though the whole route was out of the question
it seemed silly to drive all the way here for what was basically a medium
long run. I decided the right thing to do was to head up the trail
in the opposite direction to Ralston Peak.
By this point I had gone through most of my fluid, so I stopped
off at the Pyramid Creek parking lot to use the bathroom and refill
my bottles (I didn&#39;t have extra water in my car). From there, it&#39;s an easy run back to the car.&lt;/p&gt;
&lt;p&gt;One nice thing about doing the route this way is that your car is
a sort of impromptu aid station, so I decided to change my shoes.
I do most of my running in &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/sense-ride-5-li3121.html#color=77359&amp;amp;size=25792&quot;&gt;Salomon Sense Ride 5s&lt;/a&gt;, but I started the day in
a pair of &lt;a href=&quot;https://www.salomon.com/en-ca/shop/product/s-lab-ultra-3-li4598.html#color=77191&quot;&gt;Salomon S/LAB Ultra 3s&lt;/a&gt; (what I used for UTMB). I like Ultra 3s but when I put them
on for the first time in months on Friday morning I didn&#39;t feel
like they were giving me quite as much support as I wanted I was
kind of disappointed in the traction I was getting on the loose rock,
so I decided to swap them for the Sense Rides, in part so I could
compare them back to back on similar terrain.&lt;/p&gt;
&lt;p&gt;I was also starting to get a bit of a hot spot on my right heel was starting to
hurt and sure enough when I took my sock off, I had a blister that
had formed and popped. There&#39;s only one thing you can really do at
that point, which is to tape it up, and fortunately I had some
strips of kinesio tape, so I slapped one on, carefully pulled my
sock back over it so it didn&#39;t peel off, and put the Sense Rides on.&lt;/p&gt;
&lt;p&gt;By this time it had really started
to rain so I swapped out my wind pants (warmish but not waterproof)
for a pair of Raidlight rain pants (the old version of &lt;a href=&quot;https://raidlight.com/en/products/pantalon-de-trail-impermeable-mixte-ultralight-mp-20k-20k&quot;&gt;these&lt;/a&gt;). I also grabbed my
&lt;a href=&quot;https://www.salomon.com/en-us/shop/product/bonatti-wp-mitten-u-19.html#color=70393&amp;amp;size=35332&quot;&gt;waterproof mittens&lt;/a&gt; which go on nicely over my regular gloves. With that, I was ready to
head up to Ralston Peak.&lt;/p&gt;
&lt;h2 id=&quot;ralston-i-%5B6.86-mi-%2B3159ft%2F-2943ft%5D&quot;&gt;Ralston I [6.86 mi +3159ft/-2943ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#ralston-i-%5B6.86-mi-%2B3159ft%2F-2943ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The Ralston climb is pretty straightforward: 2700ish feet up over a bit
more than three miles. It starts out as fire road but you quickly come to a single track
trail marking the wilderness boundary, where I also found a kiosk
for you to register for a permit (finally). I took a moment to do that
and headed up.&lt;/p&gt;
&lt;p&gt;The climb to Ralston is a lot easier than Pyramid. The footing is
about the same, except for the top, but it&#39;s only about 900fpm rather
than 1300, and that makes a big difference. Of course, that&#39;s in
equivalent conditions and by now it was really starting to rain and I
was getting pretty cold. Starting from the bottom when I was in a rain
jacket alone, I gradually ended up in glove liners, rain gloves, and
rain pants, and I would have put on my arm warmers too but I wasn&#39;t
able to get them on under my rain jacket (because of the cuffs) and
wasn&#39;t willing to take the jacket off in order to put then on.&lt;/p&gt;
&lt;figure&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/desolation-partway-up.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/desolation-partway-up-small.jpg&quot; alt=&quot;Partway up&quot; /&gt;&lt;/p&gt;
&lt;/a&gt;
&lt;figcaption&gt;
&lt;p&gt;Partway up Ralston right after I put my pants on. Not quite above the treeline&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The trail situation is a little
confusing as there is a spur trail to the top but also a trail that
bypasses the peak, and it appears that when Leor Pantilat did this he
actually went cross-country. I opted for the spur trail, which is
still pretty passable, with only a bit of climbing over rocks at the
very end.&lt;/p&gt;
&lt;p&gt;Even with all this stuff on, and working hard, I was starting to get cold as I got near the
top and it got windier. A lot windier, though not as windy as Pyramid. I don&#39;t
have any pictures from the summit however, or rather, I have this:&lt;/p&gt;
&lt;figure&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/desolation-ralston.jpg&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/desolation-ralston-small.jpg&quot; alt=&quot;Ralston summit&quot; /&gt;&lt;/p&gt;
&lt;/a&gt;
&lt;figcaption&gt;
&lt;p&gt;Me on the summit of Ralston Peak. You can get a sense of the wind in this &lt;a href=&quot;https://educatedguesswork.org/img/desolation-ralston-movie.mp4&quot;&gt;clip&lt;/a&gt;.&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This isn&#39;t really white out conditions in that you can see around you just
fine at least to see the trail in front of you, etc; it&#39;s just that I&#39;m at the top of a mountain and so everything you
would otherwise be able to see is miles away and visibility is a lot less
than that.&lt;/p&gt;
&lt;p&gt;The run down is pretty easy: it&#39;s steep but good footing and as soon
as you got off the peak there was more wind cover and I started to
warm up again.  By the time I was close to the bottom I was closing in
on 19 miles and 7500 ft and runner brain took over and I started to
think &amp;quot;maybe I should do just a bit more&amp;quot;, so I decided to turn around
at the wilderness boundary and go up &amp;quot;some of the way&amp;quot;.&lt;/p&gt;
&lt;h2 id=&quot;ralston-ii-%5B3.59-mi%2C-%2B1207ft%2F-1348ft%5D&quot;&gt;Ralston II [3.59 mi, +1207ft/-1348ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#ralston-ii-%5B3.59-mi%2C-%2B1207ft%2F-1348ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;My original plan was just to go up about .5 miles to make it a round
20 miles, but as I started to get closer to the turnaround I was like
&amp;quot;maybe 21&amp;quot;, then &amp;quot;maybe 22&amp;quot;, and finally &amp;quot;maybe 9000 ft total&amp;quot;. All this
seemed fine and then my GPS started to act up and was getting stuck
at a given elevation before jumping 50-100 feet. 9000 feet did come
eventually at about 1.8 miles, and so I turned around and headed down,
somewhat regretfully, as I was feeling quite good, but two factors
pushed me to play it safe: (1) I had to race a hundred in two weeks
and I really didn&#39;t want to dig myself too deep a hole (2) that it was still going to be cold and
rainy at the top and I didn&#39;t want to take a chance on getting hypothermic.&lt;/p&gt;
&lt;p&gt;I made it down to the car with no issues. As before, this isn&#39;t super
fast terrain and I didn&#39;t want to fall, so I just took it easy and focused on
my footing. It was still raining pretty hard, so then I got the fun of having
to get out of my wet clothes while trying to stay modestly dry. As usual,
by the time I had my clothes on I was super cold and had to run
the heater on full for the next hour or so of the drive back, but otherwise
I felt fine.&lt;/p&gt;
&lt;h2 id=&quot;nutrition&quot;&gt;Nutrition &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#nutrition&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I did this all on Maurten, which is what I plan to mostly use for
Teanaway, as my stomach can be a bit finicky and I&#39;ve found Maurten
works pretty well. This was a lot intensity effort which is easier
on your stomach, but I never really felt any stomach distress.&lt;/p&gt;
&lt;p&gt;The table below shows what I brought and what I used.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Brought&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Consumed&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Calories&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Maurten 160 drink&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;960&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Maurten Solid&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.5&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;338&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Maurten Gel 100&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Maurten Gel CAF 100&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;4&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Maurten 320 drink&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Spring Speednut gel&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Total&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As usual, I overpacked quite a bit, carrying more calories out than
I consumed. Some of this is attributable to not being out on
the trail as long as I expected, but it&#39;s also less calories/hr
than I did at Tenaya last year. In part this is because I got
distracted in the first 90 minutes and didn&#39;t eat or drink much
of anything and then also kind of lost focus on my nutrition at
the top of Pyramid. Generally, I did OK but not great once
I got to Ralston.
With that said, I also clearly brought too
much stuff; it&#39;s good to have some for emergencies, but you don&#39;t
need to have enough of &lt;em&gt;everything&lt;/em&gt; for emergencies. In retrospect
I should have probably dropped the Spring gel and one of the Maurten
320s, which would have given me a reasonable buffer even if I had
been out longer and eaten according to plan.&lt;/p&gt;
&lt;p&gt;This is the first time I had tried using Maurten Gel CAF (100 mg caffeine)
on something extended like this and I think that went well. It&#39;s
easier than having to juggle caffeine pills and you can just
take one every 2-3 hrs. I brought salt tablets (you can see them
in some of the pictures above) but you don&#39;t need them in these
cool temperatures.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/desolation-map-runalyze.png&quot; alt=&quot;Desolation Route Map&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/desolation-profile-runalyze.png&quot; alt=&quot;Desolation Profile&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Map and profile via &lt;a href=&quot;https://runalyze.com/&quot;&gt;Runalyze&lt;/a&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Obviously this didn&#39;t go as intended, which I attribute about 20% to
not being prepared and 80% to weather. I should have taken more time
to really recon the course and realize that the approach to Pyramid
was iffy I would have been more ready for it and felt better when I
hit the top.  On the other hand, if the weather hadn&#39;t been as bad, I
would have been a lot more comfortable at the top and more willing to
try to find my way down the back half of Pyramid. As it is, I think I
made the right decision not to go it alone, especially in light of how
rainy it got later. I have good gear and experience in the mountains
so I think I would have been fine, but being out that far alone&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
in bad weather is no fun. Moreover, while I did want to do an
adventure run, this was primarily a training exercise and a strategy
checkout, and from that perspective, it didn&#39;t matter that much
which sections of the trail I ran.&lt;/p&gt;
&lt;p&gt;Other than course recon, I was prepared pretty well. I had the
right gear—though if I had kept going around the loop I
might have been pretty sad about not having my rain pants and rain
gloves—and everything worked well. I did get to try
out some options and I&#39;ve now concluded that
the &amp;quot;pull the jacket over the pack&amp;quot; thing isn&#39;t going to work so I&#39;ll
be going back to a normal sized zip-up jacket. Based on this
experience I&#39;m not planning to race in the Ultra 3s: the
traction on the Sense Ride 5s is better and I like having the more
modern bouncy foam instead of the more solid Ultra 3 foam; Salomon
seems to have really dialed in the ride now on the newer foam
so it feels stable and yet bouncy.&lt;/p&gt;
&lt;p&gt;Fitness wise, this actually went quite well. This is an absurd amount
of vert over 22 miles, over 25% more than Teanaway and UTMB. Obviously it&#39;s not as long as either, but feeling like I&#39;m not even really that tired at 22 miles
and 10 hrs is about what I would want. Usually after something this
long I would be like &amp;quot;when will I be done&amp;quot; but this time I had to
really restrain myself from going all the way to the summit on the
second lap.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overall:&lt;/strong&gt; 22.7 mi, 9308 ft, 9:48:52&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
And I do mean alone. I only saw three people on the trail the
whole day, at the top of Pyramid. &lt;a href=&quot;https://educatedguesswork.org/posts/desolation-wilderness/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Private Access Tokens, also not great</title>
		<link href="https://educatedguesswork.org/posts/private-access-tokens/"/>
		<updated>2023-08-29T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/private-access-tokens/</id>
		<content type="html">&lt;img class=&quot;img-float&quot; src=&quot;https://educatedguesswork.org/img/not-a-pipe-captcha.png&quot; alt=&quot;Not a pipe CAPTCHA&quot; /&gt;
&lt;p&gt;In my &lt;a href=&quot;https://educatedguesswork.org/posts/wei&quot;&gt;post&lt;/a&gt; on Chrome&#39;s &lt;a href=&quot;https://github.com/RupertBenWiser/Web-Environment-Integrity/blob/main/explainer.md&quot;&gt;Web Environment Integrity (WEI)
proposal&lt;/a&gt;
I briefly mentioned Apple&#39;s &lt;a href=&quot;https://developer.apple.com/videos/play/wwdc2022/10077/&quot;&gt;Private Access
Tokens (PAT)&lt;/a&gt;
mechanism, which, as Tim Perry observes, is &lt;a href=&quot;https://httptoolkit.com/blog/apple-private-access-tokens-attestation/&quot;&gt;already
deployed&lt;/a&gt;.
The stated use case for Private Access Tokens is to reduce the need for
CAPTCHAs (the little puzzles you get asked to solve to prove that
you are a human).&lt;/p&gt;
&lt;p&gt;This is a good objective because (1) CAPTCHAs suck (I can never
decide whether the post holding up the stoplight is part of the
stoplight!) and (2) they increasingly don&#39;t work because
captcha solving bots have gotten very good and humans &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Flynn_effect&amp;amp;oldid=1169973308#Possible_end_of_progression&quot;&gt;aren&#39;t getting any smarter.&lt;/a&gt;&lt;/p&gt;
&lt;figure class=&quot;img-center&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/captcha-solving.png&quot; alt=&quot;Humans versus bots for CAPTCHA solving&quot; /&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;Source: Searles, Nakatsuka, Ozturk, Paverd, Tsudik and Enkoji
&lt;a href=&quot;https://arxiv.org/pdf/2307.12108.pdf&quot;&gt;&amp;quot;An Empirical Study and Evaluation of Modern CAPTCHAs&amp;quot;&lt;/a&gt;&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is of particular relevance for Apple which is also leaning
in hard to privacy technologies like &lt;a href=&quot;https://support.apple.com/en-us/HT212614&quot;&gt;iCloud Private Relay&lt;/a&gt;, which &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/&quot;&gt;conceals your IP address&lt;/a&gt;. The problem
here is that a lot of anti-abuse mechanisms &lt;a href=&quot;ttps://datatracker.ietf.org/doc/html/draft-irtf-pearg-ip-address-privacy-considerations&quot;&gt;rely heavily on IP address reputation&lt;/a&gt;.
it&#39;s hard for those technologies to build up a reputation—either
positive or negative—for
your IP address. This is
especially true if you are &lt;em&gt;also&lt;/em&gt; browsing with settings that
reduce the effectiveness of cookies, for instance if you are
using Tor Browser or any regular browser in &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/&quot;&gt;Private Browsing Mode/Incognito&lt;/a&gt;
mode because it also prevents the site from building up reputation
via the cookie. (See
Matthew Prince&#39;s &lt;a href=&quot;https://blog.cloudflare.com/the-trouble-with-tor/&quot;&gt;post&lt;/a&gt;
on this for more background.)&lt;/p&gt;
&lt;p&gt;One response by sites is just to show CAPTCHAs whenever they
see a &amp;quot;new&amp;quot; user who doesn&#39;t have a cookie or with an IP
address that doesn&#39;t have a reputation—or has a bad reputation—
or is used by an anonymity service. This is obviously annoying to
users and not really what sites want either, because they
want people to visit their site, not bounce off the CAPTCHA.
What you really want is some way to attach a positive reputation
to someone without tracking them.&lt;/p&gt;
&lt;h2 id=&quot;privacy-pass&quot;&gt;Privacy Pass &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#privacy-pass&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;When you look at the problem this way, the broad shape of a solution
presents itself, at least if you&#39;re a cryptographer: you need
anonymous tokens. The basic idea here is that you solve a CAPTCHA and
in return get an anonymous token which lets you prove that you solved
it so you can skip the CAPTCHA next time. This is what is specified in
the IETF&#39;s &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-privacypass-architecture-06.html&quot;&gt;Privacy
Pass&lt;/a&gt;
protocol.
In Privacy Pass, tokens are issued by working with a pair of entities
called the &amp;quot;Attester&amp;quot; and the &amp;quot;Issuer&amp;quot;, and are consumed by the &amp;quot;Origin&amp;quot;
(the Web server) as shown below:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/privacy-pass-issuance.png&quot; alt=&quot;Privacy Pass Overview&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: Privacy Pass Draft]&lt;/p&gt;
&lt;p&gt;In this scenario, the Attester is responsible for ensuring you solved
the CAPTCHA—or enforcing whatever other properties one might be
interested in, as we&#39;ll see shortly—and then conveys some
kind of attestation to the the issuer that it has done
so.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
The issuer then issues an anonymous token (see
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#digression%3A-anonymous-credentials&quot;&gt;here&lt;/a&gt;
for an overview of how this works) to the client. The client can then
use the token to prove to the Origin (the actual site) that it is
approved. It can also be used for other forms of anonymous authentication,
for instance &lt;a href=&quot;https://www.apple.com/icloud/docs/iCloud_Private_Relay_Overview_Dec2021.pdf&quot;&gt;iCloud Private Relay&lt;/a&gt;
uses a similar technique to allow users to anonymously prove that they
are customers.&lt;/p&gt;
&lt;p&gt;Obviously, I&#39;m oversimplifying here and a huge amount of work has gone
into trying to make Privacy Pass have the right security and privacy
properties. There are also still some pieces which need work,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
but for the purpose of this post we can ignore the details and
assume that it functions as advertised.&lt;/p&gt;
&lt;h2 id=&quot;private-access-tokens&quot;&gt;Private Access Tokens &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#private-access-tokens&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The important thing to realize here is that Privacy Pass is a
&lt;em&gt;generic&lt;/em&gt; technology which just transports the fact that you satisfied
the attester.  The important operational question, however, is what
you had to do to satisfy the attester. The original design of Privacy
Pass was built around the idea that what you did was solve a CAPTCHA,
but Privacy Pass is agnostic on this point, and in principle the
attester can demand anything. This brings us to &lt;a href=&quot;https://developer.apple.com/videos/play/wwdc2022/10077/&quot;&gt;Private Access
Tokens&lt;/a&gt;,
Apple&#39;s implementation of Privacy Pass using Apple as the attester, as
shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/icloud-privacy-pass.jpeg&quot; alt=&quot;Private Access token diagram&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: Apple]&lt;/p&gt;
&lt;p&gt;Based on the description in the video, Apple is checking for the following properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;This is a valid piece of [Apple] hardware&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The user&#39;s iCloud account is in good standing (i.e., you have to be
signed in with an Apple ID).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[Optional] performs rate limiting to limit the use in bot farms&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If these checks pass then you will be able to get a token from
the issuer.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;ios-browser-engines&quot;&gt;iOS Browser Engines &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#ios-browser-engines&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;One thing that a lot of people don&#39;t know is that Chrome and Firefox on
iOS are quite different from Chrome and Safari on desktop. The reason
for this is that Apple requires everyone to use their &lt;a href=&quot;https://webkit.org/&quot;&gt;WebKit browser engine&lt;/a&gt;
(the thing that actually renders the Web page) on iOS; in fact you have
to use the copy of WebKit built into iOS. Chrome and Firefox each
have their own engines (&lt;a href=&quot;https://www.chromium.org/blink/&quot;&gt;Blink&lt;/a&gt; and &lt;a href=&quot;https://firefox-source-docs.mozilla.org/mobile/android/geckoview/contributor/geckoview-architecture.html&quot;&gt;Gecko&lt;/a&gt; respectively),
but they aren&#39;t allowed to use these on iOS. As a result, both Chrome and
Firefox on iOS behave have a lot more like Safari—at least from the
perspective of how they interact with the Web—than they do like
their desktop counterparts. This is not true for Android, where these
browsers use the same engine as on desktop.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As I understand the situation, this will just work if you are on
Safari but doesn&#39;t work on other browsers such as Chrome and
Firefox, at least on desktop. This is partly because Apple doesn&#39;t seem to provide generic
APIs that allow you to to use Private Access Tokens but instead only
makes them available via their own networking APIs (WebKit and
&lt;a href=&quot;https://developer.apple.com/documentation/foundation/urlsession&quot;&gt;URLSession&lt;/a&gt;).
This means every browser on iOS because Apple requires you to use
their browser engine on iOS.
However,
on desktop Firefox and Chrome use their own networking stacks, so this doesn&#39;t
work for them, really,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
though I suppose Apple could provide APIs that those browsers could use.
Of course, those browsers could also negotiate their own deal with attesters.&lt;/p&gt;
&lt;h2 id=&quot;policy%3A-browsers%2C-issuers%2C-attesters%2C-and-origins&quot;&gt;Policy: Browsers, Issuers, Attesters, and Origins &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#policy%3A-browsers%2C-issuers%2C-attesters%2C-and-origins&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is a complicated system with four separate players and that
makes it hard to sort out the various policies in play:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The Origin server (i.e, the Web site) gets to decide which
Issuers they accept.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Issuer gets to decide what Attesters it trusts and which
policies it expects them to enforce.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Attester gets to decide what policies they actually
enforce.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Browser gets to determine which Attesters and Issuers
they are actually willing to work with.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result is that what policies you are actually subject to is
determined by the interaction of the preferences of all of these
parties, with the Browser and the Origin being the most important,
because the Origins know what they are demanding and the Browser
knows which Issuers and Attesters they will work with. The Origins
can always find new Issuers/Attesters, and the Browsers can
always blocklist them.&lt;/p&gt;
&lt;p&gt;In the actual existing Apple system, the Attester and Browser
Apple and  Apple&#39;s policy is that
you need to have an Apple device and an iCloud account.
The current issuers they have announced
are &lt;a href=&quot;https://blog.cloudflare.com/eliminating-captchas-on-iphones-and-macs-using-new-standard/&quot;&gt;Cloudflare&lt;/a&gt;
and &lt;a href=&quot;https://www.fastly.com/blog/private-access-tokens-stepping-into-the-privacy-respecting-captcha-less&quot;&gt;Fastly&lt;/a&gt;.
Moreover, Cloudflare and Fastly can also act as the origin servers (web sites) in this case,
which means that if you use them to serve your Web site they can automatically
consume PAT. Because of the way the crypto is designed, it&#39;s fine
to have the Issuer and the Origin be the same, as they cannot
link the client&#39;s behavior; in fact the Issuer, Attester, and Origin
can all be the same.&lt;/p&gt;
&lt;h2 id=&quot;the-general-equilibrium&quot;&gt;The General Equilibrium &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#the-general-equilibrium&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;From a technical perspective, this is all pretty reasonable stuff, but
the thing to understand is that this is a generic system which is
compatible with any policy the Attesters and Issuers want to enforce.
As we saw with &lt;a href=&quot;https://educatedguesswork.org/posts/wei&quot;&gt;WEI&lt;/a&gt;, the question is then what policies
they will choose to enforce. The policy enforced by the combination
of Apple&#39;s attesters and the issuers they have chosen is that you
paid Apple for a device and have an iCloud account. This is
very different from &amp;quot;the person solved a CAPTCHA&amp;quot; because that
policy works just as well for people who don&#39;t have Apple devices.&lt;/p&gt;
&lt;p&gt;This is actually a pretty reasonable proxy for &amp;quot;is a person and not a
bot&amp;quot;, but the bigger picture consequences aren&#39;t great, as I don&#39;t
really want to live in a world where everyone who hasn&#39;t bought an
Apple device has to solve CAPTCHAs all the time. Of course, most
people don&#39;t use Apple devices and many of those still use Chrome or
Firefox, so that limits how aggressive sites can be about requiring
repeated CAPTCHA solving for people who don&#39;t have those devices. But
what happens if similar functionality gets added to Android&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
and Windows and now suddenly the vast majority of devices have some kind
of PAT-like functionality? In that case, sites will be able to be much
more aggressive about requiring CAPTCHAs or just refuse to serve other
users at all, as they will only be annoying a fairly small fraction of
their users.&lt;/p&gt;
&lt;p&gt;Of course, the situation will become even worse as AI gets better at
solving CAPTCHAs.  The basic problem here is that we don&#39;t really have
a good, cheap, signal for &amp;quot;is a human&amp;quot; that doesn&#39;t require somehow
buying into some bigco ecosystem, whether it&#39;s buying a device from a
given manufacturer, having an account with some big service, or
both. But the consequence of that is risking making using the Internet
a lot harder for people who don&#39;t want to do one of those things.&lt;/p&gt;
&lt;p&gt;Stepping back, I worry about the equilibrium steady state: the more
that people are able to authenticate these
technologies the more attractive it is for sites to basically require them,
to increase the level of scrutiny (as in WEI),
and provide a massively inferior experience to those who can&#39;t.
Ironically, this is actually a direct consequence of Privacy Pass
being well-designed so that it&#39;s seamless and provides a good level
of privacy, because that makes it seem less objectionable to require,
as opposed to (say) making everyone log in with a Google account.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
At the end of the day, though, the risk is further entrenching the
existing big players.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This split architecture is intended to be flexible but is a bit confusing
pedagogically. &lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;As I understand the situation, despite this somewhat confusing
diagram, the browser talks to the issuer through the attester. &lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In particular there are a concerns about metadata
smuggling by using different keys to sign different people&#39;s
tokens, and there are &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-privacypass-key-consistency-01.html&quot;&gt;efforts to address that&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is a feature of a lot of Apple&#39;s networking technologies, which
they like to bake into the operating system. This is very convenient
for small shops but less so for big implementors like browsers
who would prefer to control networking themselves. &lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Chrome does have a similar technology called &lt;a href=&quot;https://developer.chrome.com/docs/privacy-sandbox/private-state-tokens/&quot;&gt;Private State Tokens&lt;/a&gt; but as far as I can tell it&#39;s not
tied into a Google-operated attestation system the way that
PAT is. &lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I owe this observation to Kate Hudson. &lt;a href=&quot;https://educatedguesswork.org/posts/private-access-tokens/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>The endpoint of Web Environment Integrity is a closed Web</title>
		<link href="https://educatedguesswork.org/posts/wei/"/>
		<updated>2023-08-18T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/wei/</id>
		<content type="html">&lt;p&gt;Chrome&#39;s &lt;a href=&quot;https://github.com/RupertBenWiser/Web-Environment-Integrity/blob/main/explainer.md&quot;&gt;Web Environment Integrity (WEI) proposal&lt;/a&gt; for remote Web browsing attestation is being justly criticized from a broad variety of perspectives (&lt;a href=&quot;https://github.com/mozilla/standards-positions/issues/852&quot;&gt;Mozilla Standards Position&lt;/a&gt;,
&lt;a href=&quot;https://www.ghacks.net/2023/07/31/brave-browser-wont-support-googles-web-environment-integrity-api/&quot;&gt;Brave&lt;/a&gt;,
&lt;a href=&quot;https://www.eff.org/deeplinks/2023/08/your-computer-should-say-what-you-tell-it-say-1&quot;&gt;EFF&lt;/a&gt;).
I certainly agree that WEI is bad news, and I&#39;ll get to that part
eventually, but first I&#39;d like to situate it in
the broader context, both of the Web and the Internet,
starting with some history.&lt;/p&gt;
&lt;h2 id=&quot;the-bell-system&quot;&gt;The Bell System &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#the-bell-system&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first communications network available to regular people
was the telephone. Of course, the telegraph already existed,
but regular people didn&#39;t have telegraphs: you went down to
the telegraph office to send messages. By contrast, you could
have a telephone in your home and use it to call other people
who had phones in their homes. Miraculous!&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;bring-your-own-phone&quot;&gt;Bring your own phone &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#bring-your-own-phone&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;I didn&#39;t know until I started writing this post that
it was sort-of possible to buy your own phone and install
it but you had to first transfer the phone to AT&amp;amp;T and
then &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=History_of_AT%26T&amp;amp;oldid=1164585879#Monopoly&quot;&gt;rent it back from them&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;From the early 1900s until 1983, telephone service in the United
states was essentially a monopoly (the &lt;a href=&quot;https://en.wikipedia.org/wiki/Bell_System&quot;&gt;Bell
System&lt;/a&gt;) operated by
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=History_of_AT%26T&amp;amp;oldid=1164585879#Monopoly&quot;&gt;AT&amp;amp;T&lt;/a&gt;.
The telephone network included not only the wires and switches that
the phone company operates today but also the wire in your house and
the phone in your hand, all the way up to your ear. Customers
rented phones from a subsidiary of AT&amp;amp;T called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Western_Electric&amp;amp;oldid=1165130659&quot;&gt;Western
Electric&lt;/a&gt;,
and they generally looked something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Western_Electric_phone.jpg&quot; alt=&quot;Western Electric Phone&quot; /&gt;
[Source: &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Western_Electric_10_Button_WE_1500_-_Telephone_Museum_-_Waltham,_Massachusetts_-_DSC08111.jpg&quot;&gt;Wikipedia&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;If you wanted to connect something else not made by Western
Electric to the phone network, you were mostly out of luck.
This doesn&#39;t just mean no cooler looking phones, but also no cordless phones,
answering machines, or modems; basically anything other than
a Western Electric brick. Unsurprisingly, there was not a huge amount of innovation in this market,
though Western Electric &lt;em&gt;would&lt;/em&gt; sell you a somewhat cooler
looking &amp;quot;Princess Phone&amp;quot;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Princess_Phone.jpg&quot; alt=&quot;Princess Phone&quot; /&gt;
[Source: &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Western_Electric_Company_Princess_phones.jpg&quot;&gt;Wikipedia&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;It&#39;s important to understand that there wasn&#39;t any real technical
obstacle to connecting your own phone to the AT&amp;amp;T network. Regular
telephones (what people used to call &lt;em&gt;POTS&lt;/em&gt; for &amp;quot;plain old telephone service&amp;quot;)
are actually quite simple devices to build, mostly consisting of
analog signals over two copper wires; you just weren&#39;t allowed
to, by which I don&#39;t just mean that AT&amp;amp;T would be mad at you but that
it was actually prohibited by the FCC:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;No equipment, apparatus, circuit or device not furnished by the
telephone company shall be attached to or connected with the
facilities furnished by the telephone company, whether physically,
by induction or otherwise except as provided in 2.6.2 through 2.6.12
following. In case any such unauthorized attachment or connection is
made, the telephone company shall have the right to remove or
disconnect the same; or to suspend the service during the
continuance of said attachment or connection; or to terminate the
service.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That changed in 1968 with the &lt;a href=&quot;https://web.archive.org/web/20150120021035/http://www.uiowa.edu/~cyberlaw/FCCOps/1968/13F2-420.html&quot;&gt;Carterfone decision&lt;/a&gt; in which the FCC struck this provision and
allowed consumers to connect their own equipment&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
to the
network as long as it did not cause harm to the network itself.
This opened the door for customers to attach their own equipment
to the phone network and more importantly for innovation that
didn&#39;t come out of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Bell_Labs&amp;amp;oldid=1166971139&quot;&gt;New Jersey&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Naturally, the first things people wanted to install were local
improvements to their experience that worked with standard voice
phones on the other end (cordless phones, answering machines, etc.), but the
Carterfone decision
also implicitly allowed the use of the phone network for &lt;em&gt;data&lt;/em&gt; transmission—effectively
encoded in sound,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
because that&#39;s all the phone network could carry—which
meant fax machines and eventually modems (originally for primitive computer
networking like BBSes and eventually for the Internet).
Of course, you were still tied to the phone network, which—at
least until 1984—was entirely owned by AT&amp;amp;T, but as long
as you were calling someone with a compatible system and could cram your
data into an 8 kHz channel, you could do anything you wanted without
getting permission from the phone company.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
If you were really fancy, you could even get the phone company
to sell you a leased line that would carry data, but that&#39;s
not something regular people did.&lt;/p&gt;
&lt;h3 id=&quot;in-which-the-phone-company-was-sort-of-right&quot;&gt;In which the phone company was sort of right &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#in-which-the-phone-company-was-sort-of-right&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Ironically, while the phone company was wrong about consumer devices
like Carterfone presenting a threat to the telephone network, they were sort
of right about the threat of letting anybody interconnect. The
basic problem is that the telephone network was designed under the assumption
that all the constituent parts were operated by the same people
and that those people were trustworthy. When this is not true
the security of the system breaks down.&lt;/p&gt;
&lt;p&gt;Probably the best publicized example of this is the widespread
exploitation of the phone network by &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Phreaking&amp;amp;oldid=1160695271&quot;&gt;phreaks&lt;/a&gt;
for free phone calls—especially long distance—and general
exploration of the phone system. The details of this kind of
exploitation are out of scope of this post, but the general
problem was that the system wasn&#39;t designed to be robust to compromised
endpoints, or even, famously, to someone who could inject the
right tones &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=2600_hertz&amp;amp;oldid=1141061593&quot;&gt;into the network&lt;/a&gt;.
Less famously, the network is &lt;em&gt;still&lt;/em&gt; vulnerable to impersonation
attacks in which the caller generates a fake number and the callee&#39;s
network just trusts its representation. These attacks are finally
being fixed by a set of technologies known
as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=STIR/SHAKEN&amp;amp;oldid=1165301700&quot;&gt;STIR/SHAKEN&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;From the perspective of someone who works on Internet protocols,
all of these issues just look like design flaws in the system:
we just assume that other components of the system are malicious
unless proven otherwise. But from the perspective of the original
designers, these were closed systems consisting of trusted elements,
and when one of the elements misbehaved then you had problems.&lt;/p&gt;
&lt;h2 id=&quot;the-internet&quot;&gt;The Internet &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#the-internet&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At around the same time all this was happening, the first primitive
computer networks were being constructed (the first ARPANET nodes went
online in 1969). From nearly the beginning, the ARPANET and then
the Internet was conceived of as an &lt;em&gt;open&lt;/em&gt; system, a &amp;quot;network of
networks&amp;quot; in which each network was independent.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
All that was required to be part of the Internet was to (1) speak the
right protocols and (2) find someone willing to connect with you
and route your traffic.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
And the protocols were of course public, being published in the
earliest  &lt;em&gt;&lt;a href=&quot;https://www.rfc-editor.org/&quot;&gt;Requests For Comments (RFCs)&lt;/a&gt;&lt;/em&gt;.
This applied not just to the basic protocols like IP itself, but also
to the application protocols on top like e-mail (&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Simple_Mail_Transfer_Protocol&amp;amp;oldid=1165028325&quot;&gt;SMTP&lt;/a&gt;, &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc822&quot;&gt;RFC 822&lt;/a&gt;) and
remote access (&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Telnet&amp;amp;oldid=1166536444&quot;&gt;Telnet&lt;/a&gt;).
From very early on there were multiple implementations of these
systems that would talk to each other; as long as your implementation
could send and receive the right messages, everything would work
right.&lt;/p&gt;
&lt;h3 id=&quot;electronic-mail%3A-the-original-killer-app-for-the-internet&quot;&gt;Electronic Mail: The Original Killer App for the Internet &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#electronic-mail%3A-the-original-killer-app-for-the-internet&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As an example, let&#39;s look at the original Internet communications app:
electronic mail.&lt;/p&gt;
&lt;p&gt;When the Internet was first developed, personal computers were
uncommon and instead what people mostly had was access to bigger
computers (e.g., owned by their company or university) in what&#39;s
called a &amp;quot;time sharing&amp;quot; system, which just meant that multiple people
could use the same computer at once, with everyone having their own
account and workspace.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/email-timesharing.png&quot; alt=&quot;Old style email&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The diagram above shows how mail works in this environment.
Each computer has a single system process called a
&lt;em&gt;mail transfer agent (MTA)&lt;/em&gt;, which is responsible for sending
and receiving e-mail with other computers. The historical program
was called &lt;a href=&quot;https://www.proofpoint.com/us/products/email-protection/open-source-email-solution&quot;&gt;Sendmail&lt;/a&gt;.
In order to use the system, the user logs into the system
(more on this below) and then uses a program called a &lt;em&gt;mail user agent (MUA)&lt;/em&gt;
(traditionally just a program called &amp;quot;mail&amp;quot;).&lt;/p&gt;
&lt;p&gt;Alice can send mail to Carol using the MUA, which contacts the
MTA&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
and asks it to send it to Carol. The MTA then contacts the
MTA—using a protocol called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Simple_Mail_Transfer_Protocol&amp;amp;oldid=1165028325&quot;&gt;SMTP&lt;/a&gt;—on Carol&#39;s computer and asks it to deliver it. Carol&#39;s MTA then
stores it on the disk in Carol&#39;s mail file (this is just a single
big file with all the messages in it). Carol can then use
her MUA to read her messages.&lt;/p&gt;
&lt;p&gt;Importantly, both the MTA and MUA are readily replaceable:
the system administrator can replace the MTA (other popular
MTAs include &lt;a href=&quot;http://www.postfix.org/&quot;&gt;postfix&lt;/a&gt; and
&lt;a href=&quot;https://cr.yp.to/qmail.html&quot;&gt;qmail&lt;/a&gt;) and users can choose
their own MUAs (writing new MUAs was a very popular pass-time
in the early days of the Internet). In fact, two users
on the same computer can run different MUAs without interfering
with each other. What makes this work is that both the
protocol that the MTAs use to talk to each other and the
interface between the MUA and MTA are stable and well-defined.
The end result is that people are able to customize their
own e-mail experience, including the look and feel, filtering,
etc.&lt;/p&gt;
&lt;h4 id=&quot;remote-mail&quot;&gt;Remote Mail &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#remote-mail&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Back in the really old days, you would log directly into the
server, either by using a terminal directly connected to it
or over a modem. In either case, you&#39;re running the MUA
directly on the server, which, recall you are sharing
with others. That computer is just displaying stuff
on your screen. This typically looked something like
this (if you were lucky):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Mailbox is &#39;/usr/mail/mymail&#39; with 15 messages  [Elm 2.4PL22]
        -&amp;gt;   N     1   Apr 24   Larry Fenske   (49)    Hello there
             N     2   Apr 24   jad@hpcnoe     (84)    Chico?  Why go there?
             E     3   Apr 23   Carl Smith     (53)    Dinner tonight?
             NU    4   Apr 18   Don Knuth      (354)   Your version of TeX...
             N     5   Apr 18   games          (26)    Bug in cribbage game
              A    6   Apr 15   kevin          (27)    More software requests
                   7   Apr 13   John Jacobs    (194)   How can you hate RUSH?
              U    8   Apr 8    decvax!mouse   (68)    Re: your Usenet article
                   9   Apr 6    root           (7)
             O    10   Apr 5    root           (13)

       You can use any of the following commands by pressing the first character;
       d)elete or u)ndelete mail, m)ail a message, r)eply or f)orward mail, q)uit
       To read a message, press &amp;lt;return&amp;gt;.  j = move down, k = move up, ? = help
        Command : @
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;[Source: &lt;a href=&quot;http://www.instinct.org/elm/doc/Users.txt&quot;&gt;ELM user&#39;s guide&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;This is from a relatively modern UNIX mailer called &lt;a href=&quot;http://www.instinct.org/elm/&quot;&gt;ELM&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This was fine back in the day, but as people started to get more
powerful personal computers, it became increasingly unsatisfactory,
for a number of reasons, but principally because it was slow and
ugly. Slow because every time you wanted to do anything it required
a round trip to the server. This included when you were composing an
email and every character you typed had to go up to the server before
it was echoed on your screen. Ugly because it was only this kind of
text-based display and people (1) wanted a GUI and (2) wanted to
be able to display rich content such as emails containing images.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;pop-versus-imap&quot;&gt;POP versus IMAP &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#pop-versus-imap&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The major conceptual difference between POP and IMAP is that
POP is designed for a scenario where the user downloaded all
of their new messages and then deleted them from the server.
This works fine if you only have one mail client but if you
have multiple devices (say a laptop and a phone) then once
one device has downloaded the messages, they won&#39;t be available
for the other device, which is obviously bad.
By contrast, IMAP is designed to leave all of the messages
on the server, which means that multiple devices can
be used to access the same mail account. IMAP also has
support for storing a lot of state (e.g., folders, read versus unread,
etc.) on the server, thus providing a more seamless experience
for the user.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The obvious fix is to run the MUA on the user&#39;s machine and instead
have it retrieve the mail from the server and display it locally.
In principle, the MUA could just log in as Alice, download all the
messages, and process them locally, but that would be inconvenient and
slow; what you want is some network protocol that allows you to retrieve
messages one at a time. The first popular such protocol
was called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Post_Office_Protocol&amp;amp;oldid=1166046941&quot;&gt;Post Office Protocol (POP)&lt;/a&gt;
but POP has been to some extent superseded by &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internet_Message_Access_Protocol&amp;amp;oldid=1165977946&quot;&gt;Internet Message Access Protocol (IMAP)&lt;/a&gt;. In either case, there is some program
running on the mail server machine which runs POP or IMAP. The
MUA on the user&#39;s machine contacts that server and uses the
relevant protocol to retrieve the user&#39;s messages, as shown
in the figure below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/email-remote.png&quot; alt=&quot;Email with one remote user&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Importantly, nothing had to change on Carol&#39;s side
in order to allow Alice to read her mail remotely like this.
&lt;a href=&quot;http://atlanta.org/&quot;&gt;atlanta.org&lt;/a&gt; just had to install an IMAP server&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
and then Alice could download an appropriate MUA and
use it to talk to the server. Moreover, it&#39;s possible
for some people on &lt;a href=&quot;http://atlanta.org/&quot;&gt;atlanta.org&lt;/a&gt; to use remote mail
and some to read their mail by logging in as before,
as we see Bob doing in the picture above.
Of course, the mail provider can choose to offer remote
only service without
offering the ability to run programs on their servers at all. This is
an important operational and security advantage and is how most big mail
providers (e.g., Gmail) operate now. However, all of this is invisible to the other side.&lt;/p&gt;
&lt;p&gt;Moreover, once &lt;a href=&quot;http://atlanta.org/&quot;&gt;atlanta.org&lt;/a&gt; has installed an IMAP (or
POP) server Alice is free to use &lt;em&gt;any&lt;/em&gt; MUA she wants
as long as it speaks IMAP (or POP). Because the protocols
are published anyone can just write their own MUA
that conforms to the protocols.
Again, this is critically
important because it allows for new mail software
to innovate and for Alice to choose the interface and
features she likes the best (or even to write her own mail
software!).
You want all the images suppressed or rendered in black and white? Simple matter
of programming? No problem.
You want to read your email
in a different font? Sounds good.
You want it read out loud to you in the voice
of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Malcolm_Tucker&amp;amp;oldid=1169848196&quot;&gt;Malcolm Tucker&lt;/a&gt;? Simple
matter of programming.
The client is in total control of how things are rendered because it&#39;s
an open, interoperable system.&lt;/p&gt;
&lt;p&gt;In principle, of course, it was always possible to build a totally closed
mail system—Microsoft Exchange was like this to some extent—once
an interoperable ecosystem had been developed it had a tremendous advantage
because it was easy to &lt;em&gt;unilaterally&lt;/em&gt; roll out a new mail client or
server without changing every other part of the system. Even mail systems
which had proprietary elements were still forced to speak standard protocols
to some extent, especially for the mail format and delivery parts of the
system.&lt;/p&gt;
&lt;h3 id=&quot;other-applications&quot;&gt;Other Applications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#other-applications&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Of course, e-mail isn&#39;t the only application that can run on the Internet.
The way the Internet protocols was designed is inherently flexible.
providing &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro&quot;&gt;transport
protocols&lt;/a&gt; that can carry any kind
of traffic, so if you want to build a new application and it can run
over IP (these days, &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#non-tcp%2Fudp-protocols&quot;&gt;TCP and
UDP&lt;/a&gt;), you can carry it
over the Internet, with no need to stuff it into an 8 kHz voice
channel. Moreover, you don&#39;t need any cooperation from the network
itself; you just need to upgrade the endpoints to support your
new application, which is a huge deployment for advantage.
The result of these design choices was an explosion of innovation, starting in around
1992 with the Web and that is still happening today.&lt;/p&gt;
&lt;h2 id=&quot;the-web&quot;&gt;The Web &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#the-web&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This brings us to the topic of the Web which is probably still the
most important single application on the Internet. With all that,
it&#39;s technically just another networked application.&lt;/p&gt;
&lt;p&gt;When the Web was designed, it was built on similar
principles to the Internet as a whole, with published—though initially without
really clear specifications—interoperable protocols that anyone could
implement.  More or less independent implementations of Web clients
and servers started to appear quite soon after Tim Berners-Lee&#39;s
initial announcement of the Web and everyone just expected
that they would talk to each other. In fact, that&#39;s what
it &lt;em&gt;meant&lt;/em&gt; to be part of the Web. Here&#39;s how we described
this in Mozilla&#39;s &lt;a href=&quot;https://www.mozilla.org/en-US/about/webvision/full&quot;&gt;Web Vision&lt;/a&gt;
(Emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A key strength of the Web is that there are minimal barriers to
entry for both users and publishers. This differs from many other
systems such as the telephone or television networks which limit
full participation to large entities, inevitably resulting in a
system that serves their interests rather than the needs of
everyone. (Note: in this document &amp;quot;publishers&amp;quot; refers to entities
who publish directly to users, as opposed to those who publish
through a mediated platform.)&lt;/p&gt;
&lt;p&gt;One key property that enables this is interoperability based on
common standards; &lt;strong&gt;any endpoint which conforms to these standards is
automatically part of the Web&lt;/strong&gt;, and the standards themselves aim to
avoid assumptions about the underlying hardware or software that
might restrict where they can be deployed. This means that no single
party decides which form-factors, devices, operating systems, and
browsers may access the Web. It gives people more choices, and thus
more avenues to overcome personal obstacles to access. Choices in
assistive technology, localization, form-factor, and price, combined
with thoughtful design of the standards themselves, all permit a
wildly diverse group of people to reach the same Web.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As of the mid 2000s, the Web was the dominant paradigm for application
delivery: if you wanted to build some kind of networked application—and
often a non-networked one—you stood up a Web site. This paradigm
was so powerful that it even started to absorb standalone
applications like e-mail. A full account of this phenomenon would
be too long to include in this post, but it seems clear that a huge
part of it is due to how easy it is to deploy Web applications to
users; there&#39;s nothing for them to download or install, they just go
to your Web site and the application runs right in the browser. Better
yet, when you release a new version you don&#39;t need to update the
user, they just get the new version whenever they go to your site
again.&lt;/p&gt;
&lt;p&gt;As with other interoperable applications, the design of the Web
allows the client to control how content is rendered and how the
user interacts with it. Some important examples of this kind
of user control include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Accessibility features such as screen readers&lt;/li&gt;
&lt;li&gt;Automatic password and credit-card form-fill&lt;/li&gt;
&lt;li&gt;Ad blocking&lt;/li&gt;
&lt;li&gt;Translating Web pages into a different language&lt;/li&gt;
&lt;li&gt;&amp;quot;Reader&amp;quot; modes&lt;/li&gt;
&lt;li&gt;Downloading pieces of the page (e.g., images) or the whole
page&lt;/li&gt;
&lt;li&gt;Developer tools which allow the user to inspect the Web page contents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Web differs from e-mail in one very important respect, which is
that the Web allows the server to &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#client-side-applications&quot;&gt;run programs on the user&#39;s
computer&lt;/a&gt;
and those applications can talk back to the server. The vast majority
of Web pages have some dynamic content in the form of JavaScript. By
contrast, e-mail content is largely static. This makes the Web a much
more powerful deployment platform but also limits the ability of the
the client to strictly control every aspect of the user&#39;s experience.&lt;/p&gt;
&lt;p&gt;A good example of this phenomenon is Web-based mail systems like
Gmail. The diagram below shows the high level architecture of this
kind of system.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Webmail.png&quot; alt=&quot;Webmail architecture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Conceptually, this is exactly the same architecture we had before,
with a MUA talking to a server, except that instead of being a standalone
app, the MUA is a JavaScript program running in the browser. However,
there&#39;s one big difference: because the Webmail service controls
both the Webmail server and the Javascript based MUA they
don&#39;t have to use a standardized protocol like IMAP; they can just build
a proprietary protocol.
And because deploying new JS code on the Web is so close to frictionless,
they can change it whenever they want. So even though it&#39;s all
running on a standardized substrate of the HTTP and HTML/JS/CSS,
systems like this are actually fairly closed because all the important
stuff is happening in the downloaded JS code rather than in the standardized
pieces.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Even so, the browser itself still maintains a fair amount of control
over how the application behaves. Aside from the examples above,
such as &lt;a href=&quot;https://support.mozilla.org/en-US/kb/about-picture-picture-firefox&quot;&gt;Firefox Picture-in-Picture&lt;/a&gt; or add-ons like
such &lt;a href=&quot;https://addons.mozilla.org/en-US/firefox/addon/enhancer-for-youtube/?utm_source=addons.mozilla.org&amp;amp;utm_medium=referral&amp;amp;utm_content=search&quot;&gt;YouTube Enhancer&lt;/a&gt; which modify the behavior of popular sites such as YouTube even though
they are to a great degree JS applications.&lt;/p&gt;
&lt;h2 id=&quot;mobile-apps-and-app-stores&quot;&gt;Mobile Apps and App Stores &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#mobile-apps-and-app-stores&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In the early 2000s it looked like the Web model had totally won
and native apps were toast but that changed in 2008 with the opening of the iOS app store.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
The app store standardized the process of downloading, installing, and
updating mobile applications—at least on iOS—resulting
in a system with almost as frictionless as the Web and with a number
of important &lt;a href=&quot;https://www.mozilla.org/en-US/about/webvision/full/#mobile&quot;&gt;technical advantages&lt;/a&gt;.
The result was a rapid takeoff of the use of mobile apps
to the point where they are the dominant &lt;a href=&quot;https://jmango360.com/mobile-app-vs-mobile-website-statistics/&quot;&gt;mode of mobile usage&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/AppleAppStoreStatistics.png&quot; alt=&quot;App store usage&quot; /&gt;
[Source &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:AppleAppStoreStatistics.png&quot;&gt;Wikipedia&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;Because of the app store, mobile apps have many of the deployment advantages of the Web
but are far less open. Just like a Web app, the vendor controls
both the client and the server, but unlike on the Web, there is
no browser intermediating the app&#39;s interaction with the user,
and so there&#39;s no opportunity to modify the behavior of the app,
e.g., for ad blocking or translation. Of course, the operating
system &lt;em&gt;could&lt;/em&gt; in principle decide to do this kind of stuff—and
the mobile OSes do do some technical enforcement of their policies—but
the platform just isn&#39;t engineered for this kind of user agent
the way the Web is.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
As a practical matter, then, if you want to use some network-based
service that hasn&#39;t gone out of their way to open their interfaces
you&#39;re mostly going to be using their app without any real opportunity
to control your own experience except in ways designed into the app.
This is why, for instance, you have to have &lt;a href=&quot;http://localhost:8080/posts/streaming-apps/&quot;&gt;five different apps on your Roku, one for each streaming service&lt;/a&gt;
(including separate ones for Disney and Hulu, even though they are owned by the same
company!), rather than a single
app which will work with any streaming service.&lt;/p&gt;
&lt;h2 id=&quot;closed-versus-open&quot;&gt;Closed versus Open &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#closed-versus-open&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There are a number of reasons why application vendors might prefer
closed versus open systems:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Flexibility.&lt;/dt&gt;
&lt;dd&gt;If you control both ends of the system, then you can evolve
it much more quickly because you don&#39;t need to wait for anyone
else to change. This is the argument made by &lt;a href=&quot;https://www.ietf.org/proceedings/80/slides/plenaryt-5.pdf&quot;&gt;Jonathan Rosenberg&lt;/a&gt; and also in this
&lt;a href=&quot;https://whispersystems.org/blog/the-ecosystem-is-moving/&quot;&gt;post&lt;/a&gt; by
Moxie Marlinspike on why Signal isn&#39;t federated.&lt;/dd&gt;
&lt;dt&gt;Barriers to entry.&lt;/dt&gt;
&lt;dd&gt;In an open system a potential competitor can enter the market
by standing up a new endpoint (e.g., a new client) without having
to displace the entire ecosystem. As a concrete example, when
Google launched Chrome they didn&#39;t have to displace every
Web server in the world because Chrome automatically worked
with them.&lt;/dd&gt;
&lt;dt&gt;Control.&lt;/dt&gt;
&lt;dd&gt;If you control the clients then you know that they behave
the way you want them to. To some extent this is just a matter
of system stability and not having to deal with potential problems
from broken clients, but it&#39;s also a way to enforce your
preferences when they might differ from those of the users.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The important point for the purposes of this post is &amp;quot;control&amp;quot;.
There are a number of situations in which the user&#39;s preferences
and those of the site aren&#39;t in alignment, such as:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Ad blocking.&lt;/dt&gt;
&lt;dd&gt;Sites and apps make money by showing ads, but users don&#39;t like to see
ads, which is why they often run ad blockers. Obviously, the providers
would prefer that users actually saw the ads.&lt;/dd&gt;
&lt;dt&gt;Access to content (digital rights management).&lt;/dt&gt;
&lt;dd&gt;Web pages can of course play audio and video, but historically
the providers of that content have been very concerned about unauthorized
downloading and reproduction. In an open system, however, nothing stops
the client from storing the raw media.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h3 id=&quot;encrypted-media-extensions&quot;&gt;Encrypted Media Extensions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#encrypted-media-extensions&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This last issue was responsible for the one major case in which
the Web has deviated from the principle of openness, namely
HTML &lt;a href=&quot;https://www.w3.org/TR/encrypted-media/&quot;&gt;Encrypted Media Extensions (EME)&lt;/a&gt;.
In the early days of the Web, media was largely played through
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Adobe_Flash&amp;amp;oldid=1168143302&quot;&gt;Adobe Flash&lt;/a&gt;, which had &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Digital_rights_management&amp;amp;oldid=1169720773&quot;&gt;Digital Rights Management (DRM)&lt;/a&gt; mechanisms designed to prevent exporting content. These mechanisms took in encrypted
media and decrypted and displayed it, but were designed to
resist user tampering to exfiltrate the media.&lt;/p&gt;
&lt;p&gt;Starting in the early 2010s browsers gradually
started to deprecate Flash, both in response to concerns
about security and as more and more of its capabilities
started to be added to the Web platform.
One of those capabilities was the ability to play video,
but the large video streaming services (especially Netflix)
were concerned about people using the browser to save
media and so were unwilling to use the HTML5 &lt;code&gt;&amp;lt;video&amp;gt;&lt;/code&gt; tag
as-is. Instead they proposed a new technology called
&lt;a href=&quot;https://www.w3.org/TR/encrypted-media/&quot;&gt;Encrypted Media Extensions (EME)&lt;/a&gt;,
in which a closed DRM &lt;em&gt;Content Decryption Module (CDM)&lt;/em&gt; was embedded in the browser to
decrypt and display the media.&lt;/p&gt;
&lt;p&gt;EME was highly controversial but eventually every major browser
included it. I can&#39;t speak for other browsers, but
I was at Mozilla when they decided to implement
EME in Firefox and the &lt;a href=&quot;https://hacks.mozilla.org/2014/05/reconciling-mozillas-mission-and-w3c-eme/&quot;&gt;conclusion&lt;/a&gt; was that given that other
browsers were going to implement EME it was better to
have people able to watch videos—which we knew they wanted
to do—in Firefox than that they switch to another browser.
The implementation of EME in Firefox was designed
to limit the capabilities of the CDM, so that it had limited
access to the user&#39;s computer and couldn&#39;t be used to track users.&lt;/p&gt;
&lt;h2 id=&quot;back-to-web-environment-integrity&quot;&gt;Back to Web Environment Integrity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#back-to-web-environment-integrity&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This all brings us back to
&lt;a href=&quot;https://github.com/RupertBenWiser/Web-Environment-Integrity&quot;&gt;WEI&lt;/a&gt;,
which is a proposal for attestation for the Web. For more background
on attestation see &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#trusted-computing&quot;&gt;here&lt;/a&gt;,
but briefly the idea with attestation is that you have some &amp;quot;trusted&amp;quot;
piece of hardware on the user&#39;s device (in this case &amp;quot;trusted&amp;quot; means
&amp;quot;not controlled by the user but rather by the manufacturer&amp;quot;, so it&#39;s
trusted by the web site, not by the user) which
is able to vouch for the software that runs on the user&#39;s computer.
Most modern mobile devices and many if not most laptop devices now
have such a piece of hardware.&lt;/p&gt;
&lt;p&gt;The motivation for the proposal is described as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Users like visiting websites that are expensive to create and maintain, but they often want or need to do it without paying directly. These websites fund themselves with ads, but the advertisers can only afford to pay for humans to see the ads, rather than robots. This creates a need for human users to prove to websites that they&#39;re human, sometimes through tasks like challenges or logins.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Users want to know they are interacting with real people on social websites but bad actors often want to promote posts with fake engagement (for example, to promote products, or make a news story seem more important). Websites can only show users what content is popular with real people if websites are able to know the difference between a trusted and untrusted environment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Users playing a game on a website want to know whether other players are using software that enforces the game&#39;s rules.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Users sometimes get tricked into installing malicious software that imitates software like their banking apps, to steal from those users. The bank&#39;s internet interface could protect those users if it could establish that the requests it&#39;s getting actually come from the bank&#39;s or other trustworthy software.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;The high level idea
is that there would be a JS API that the site could call which would
cause the browser to ask the OS—and presumably transitively
the aforementioned trusted hardware—to attest to some
properties of the browser&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
The
&lt;a href=&quot;https://rupertbenwiser.github.io/Web-Environment-Integrity/&quot;&gt;spec&lt;/a&gt; is
silent on what is being attested to and the
&lt;a href=&quot;https://github.com/RupertBenWiser/Web-Environment-Integrity/blob/main/explainer.md&quot;&gt;Explainer&lt;/a&gt;
is &lt;a href=&quot;https://github.com/RupertBenWiser/Web-Environment-Integrity/blob/main/explainer.md#what-information-is-in-the-signed-attestation&quot;&gt;pretty
fuzzy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The proposal calls for at least the following information in the signed attestation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The attester&#39;s identity, for example, &amp;quot;Google Play&amp;quot;.&lt;/li&gt;
&lt;li&gt;A verdict saying whether the attester considers the device trustworthy.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;These two pieces of information basically serve to guarantee that the code
is running on some device made by a manufacturer that the Web site
trusts. This already means that we don&#39;t have a completely open system:
because it&#39;s not possible to build a new piece of hardware yourself
that will be able to provide the correct attestation: you instead
need to have some closed third party module. You probably also need
a trusted and locked-down operating system, because otherwise
the OS can tamper with the behavior of the browser, so good luck if you want
to run Linux!&lt;/p&gt;
&lt;p&gt;Moreover, this attestation isn&#39;t very useful in and of itself: the first three
use cases are ones in which the browser connecting to the server
is controlled &lt;em&gt;by the attacker&lt;/em&gt;, and so all they demonstrate
is that the attacker was able to afford a single device made by
such a manufacturer. However, they could be running any
software they want on it. They don&#39;t even need to be &lt;em&gt;using&lt;/em&gt; the
device to run their browser. They can use a single trusted device
to generate an arbitrary number of attestations up to the performance
of the device—and modern hardware is very very fast—so
the effectiveness of this limited attestation seems fairly low.
In order to effectively address these use cases, you need the
attester to provide more information.&lt;/p&gt;
&lt;p&gt;The explainer goes on propose two other types of information:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The platform identity of the application that requested the
attestation, like com.chrome.beta, org.mozilla.firefox, or
com.apple.mobilesafari.&lt;/li&gt;
&lt;li&gt;Some indicator enabling rate limiting against a physical device&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;The basic intuition behind rate limiting is that it prevents the kind
of large-scale attacks I mentioned above in which the attacker has a
lot of browsers connected to a single trusted device. This might be
useful in terms of preventing ad fraud attempts where the attacker
pretends to have a large number of devices representing a large number
of legitimate users, though it could be tricky to set the rate limits
correctly: some people do a lot of browsing and you don&#39;t want them to
suddenly run up against a rate limit. So at best this multiplies the
attacker&#39;s costs by making them buy more trusted devices.&lt;/p&gt;
&lt;p&gt;Rate limits, do not, however, address the game anti-cheating use case
because the problem isn&#39;t that the user is doing an unreasonable number
of attestations but rather that they are running cheating software on
a legitimate device.  The only way to address this is to have the
attestation cover the software itself, in this case the Web
browser. This is where the proposal to indicate the identity of the
application (e.g., &lt;code&gt;com.chrome.beta&lt;/code&gt;) comes in. Presumably the relier
would have a list of browser software that it trusts behaves correctly
and would reject any requests from other pieces of software, or at
least flag them for special handling (and inconvenience). This means
that if you want to run something other than a major browser or
even build your own, you&#39;re totally out of luck.&lt;/p&gt;
&lt;p&gt;Moreover, in order for this to work, the software—and probably
the operating system—needs to be unmodified &lt;em&gt;and&lt;/em&gt; not to
have affordances that allow the user to adjust its behavior in
an undesired fashion. This is an incredibly strong condition
because a browser is a very complex and configurable piece of
software. For instance Firefox has hundreds of configuration parameters
that users can set, some supported and some unsupported; it&#39;s
very likely that some of them would let users modify behavior in
ways the site wouldn&#39;t want. Beyond configuration,
most browsers allow you to install
extensions/add-ons which substantially change the behavior of the
browser, so any add-ons need to be part of the trusted list.
The WEI proposal says that this should be fine because:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Web Environment Integrity attests the legitimacy of the underlying
hardware and software stack, it does not restrict the indicated
application’s functionality: E.g. if the browser allows extensions,
the user may use extensions; if a browser is modified, the modified
browser can still request Web Environment Integrity attestation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I don&#39;t see how this can be the case, though. I suppose it&#39;s possible
that as a &lt;em&gt;technical&lt;/em&gt; matter, you could get an attestation
(e.g., &amp;quot;This is a version of Firefox with unknown modifications&amp;quot;
or &amp;quot;This is a version of Firefox with the &#39;I am cheating at this game&#39;&amp;quot;
add-on), but the site clearly can&#39;t treat this attestation as
meaningful without defeating the security guarantees of the system.&lt;/p&gt;
&lt;p&gt;Of course, you might decide to abandon the anti-cheating use
case—and any others that don&#39;t involve pretending to be a lot of
different devices—but that would be much more limited system than
this, more similar to Apple&#39;s &lt;a href=&quot;https://developer.apple.com/videos/play/wwdc2022/10077/&quot;&gt;Private Access Tokens&lt;/a&gt;,
which are supposed to just attest to the device itself (this is also bad, but
not as bad as WEI). However, if you want to ensure that individual
users&#39; machines behave in some specific way, you need
the attestation to cover the software on the user&#39;s machine, not
just to attest that they had some limited amount of control of
a trusted device.&lt;/p&gt;
&lt;p&gt;I know a lot of people care about cheating in games, but it&#39;s a bit
of a niche use case. However,
the elephant in the room here is advertising: a lot of people use ad
blockers and many sites try to detect this case and refuse service to
them.  One potential application of WEI is forcing users to prove that
they&#39;re not running an ad blocker.  The explainer doesn&#39;t list this as
a use case, but also doesn&#39;t really disclaim it and once remote attestation
exists there is going to be a huge financial incentive to deploy it
for this purpose.
Obviously, preventing ad blocking in the
browser would require attesting to the whole browser stack, not just that the
browser is running on a trusted device, as if the user controls their
browser they can just disable ad display,
since ad blocking is typically a modification, or sometimes a feature, of the browser.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The bigger picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/wei/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic property of an open system like the Internet and the Web is
that you can only be assured of the properties of the elements you
directly control. The elements that belong to other people work for them
and not you. In a closed system, by contrast, the software on the
end user device works for the provider, not for them, whether it
is officially owned by the user (as in mobile apps) or it actually belongs to
the provider (as with the old Bell System monopoly).&lt;/p&gt;
&lt;p&gt;WEI and similar attestation technologies represent an attempt to
impose an alien model, that of a closed system, onto the open system
of the Web. As with any closed system, the net impact will be
that users don&#39;t control their own experience of the Web but
rather have only the experiences that sites are willing
to let them have. That seems bad.&lt;/p&gt;
&lt;!-- Cover image
     Browsers are extensible
     --&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Ironically, the Carterfone didn&#39;t actually plug into the
wall socket. Instead, it used an &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Acoustic_coupler&amp;amp;oldid=1091999910&quot;&gt;acoustic coupler&lt;/a&gt;
that tied into the phone handset. However, the decision was broad enough
to allow for electrical interconnection. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, I&#39;m simplifying here, because the phone network just carries
analog signals in a given frequency and amplitude range. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Obviously, the phone company could tell that this wasn&#39;t
voice traffic, they just had to pass it through anyway. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The jargon in routing is &amp;quot;autonomous system&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;m simplifying a bit because for some time there were actually
restrictions on commercial use, but these were gone by the early
1990s. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Actually, back in the day, it just executed &lt;code&gt;sendmail&lt;/code&gt; directly. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
And yes, I do I know about X, but remote X is not the answer.
 &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In principle Alice could have installed one just for
herself, but that&#39;s not how it&#39;s typically done. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See this &lt;a href=&quot;https://www.ietf.org/proceedings/80/slides/plenaryt-5.pdf&quot;&gt;2011 presentation&lt;/a&gt; by VoIP pioneer &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Jonathan_Rosenberg_(SIP_author)&amp;amp;oldid=1145532767&quot;&gt;Jonathan Rosenberg (JDR)&lt;/a&gt; and this
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-tschofenig-post-standardization-02&quot;&gt;Internet Draft&lt;/a&gt; by Tschofenig, Aboba, Peterson, and McPherson for an argument
that this phenomenon meant the end of application-layer standards. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Ironically, Steve Jobs initially didn&#39;t want an app store and instead
had in mind something more like what you&#39;d now call a
&lt;a href=&quot;https://web.dev/progressive-web-apps/&quot;&gt;Progressive Web App&lt;/a&gt;
but demand for real apps was overwhelming and here we are. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In addition, because of the way that the Web evolved, many
JS applications operate by changing elements on the Web page
(e.g., &amp;quot;now render this new piece of HTML&amp;quot;) which means that
the browser can generally figure out what the page is doing;
a property called &amp;quot;semantic transparency&amp;quot;. In principle,
those applications could just write pixels onto an &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API&quot;&gt;HTML canvas&lt;/a&gt; but that&#39;s more difficult
and not the
standard approach. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This might also involve calling out to
some server, but everything here is rooted in the trusted hardware
on the device. &lt;a href=&quot;https://educatedguesswork.org/posts/wei/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>How NATs Work, Part IV: TURN Relaying</title>
		<link href="https://educatedguesswork.org/posts/nat-part-4/"/>
		<updated>2023-07-17T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/nat-part-4/</id>
		<content type="html">&lt;p&gt;The Internet is a mess, and one of the biggest parts of that mess is
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_address_translation&amp;amp;oldid=1147533294&quot;&gt;Network Address Translation
(NAT)&lt;/a&gt;,
a technique which allows multiple devices to share the same network
address. This is part IV in a series on how NATs work and how to work
with them. You may want to go back to and review &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1&quot;&gt;part
I&lt;/a&gt; (how NATs work), &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2&quot;&gt;part II&lt;/a&gt;
(basic concepts of NAT traversal) and &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3&quot;&gt;part III&lt;/a&gt;
(ICE).&lt;/p&gt;
&lt;p&gt;As discussed &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#eim%3Aapf-%E2%86%94-apm%3Aapf&quot;&gt;earlier&lt;/a&gt;
there are some configurations where it is not possible
to establish a direct
connection between two endpoints. For instance, if Alice
has a NAT with address-dependent mapping and Bob has
a NAT with address-dependent filtering, then the packets from
Alice will never match any filter on Bob&#39;s NAT and will just
be dropped. Similarly, the packets from Bob will not match
any mapping on Alice&#39;s NAT and will be dropped. The only way
to send data between these two endpoints is with the assistance
of a server, as shown in the blue path in the diagram below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ICE-paths.png&quot; alt=&quot;A relay server&quot; /&gt;&lt;/p&gt;
&lt;p&gt;There are any number of possible protocols one might use to
send data through a server. For instance, you could connect
through a VPN or even send each individual packet as an
HTTP request to the server. However, the IETF has standardized
a specific protocol which is designed to be used with ICE,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Traversal_Using_Relays_around_NAT&amp;amp;oldid=1115742687&quot;&gt;Traversal Using Relays Around NAT (TURN)&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;turn&quot;&gt;TURN &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#turn&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Conceptually, TURN is an application layer relay protocol:
the TURN client (i.e., the user&#39;s device) sends packets
to the TURN server addressed to the other side and the
server forwards them, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/TURN-server.png&quot; alt=&quot;TURN server&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this example, Alice is communicating with Bob through
&lt;strong&gt;her&lt;/strong&gt; TURN server (generally each client will have
an associated TURN server, as described &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#turn-server-deployment-scenarios&quot;&gt;below&lt;/a&gt;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When she wants to send a packet to Bob, she sends it
to the server&#39;s address (198.51.100.1) but with
a label telling the server to forward it to Bob.
The server removes the label and sends the packet to
Bob.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When Bob wants to send a packet to Alice, he sends it
to the TURN server, which forwards it to Alice.
The packet will arrive at Alice&#39;s machine with
the TURN server&#39;s IP address, so the TURN server
has to add a label telling Alice that it originally
came from Bob. Otherwise Alice wouldn&#39;t be able
to distinguish between packets from Bob and Charlie
when they come through the TURN server.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s important to see that there is an asymmetry here: Alice has a
relationship with the TURN server and is explicitly communicating with
it. From Bob&#39;s perspective, however, it&#39;s just as if the packets came
from the TURN server, and unless he has some external knowledge, he
has no way of seeing that he&#39;s actually communicating with Alice
through the TURN server, rather than the server itself (because from
an IP layer perspective that&#39;s actually what&#39;s happening).&lt;/p&gt;
&lt;p&gt;The opacity of the TURN server from Bob&#39;s perspective has an important
consequence, which is that the server has to keep state in order
to distinguish multiple endpoints that Alice is talking to. Consider what
happens if the server has two clients, Alice and Charlie. The packets
from Alice and Charlie are labeled with where to send them, but the
packets from Bob are not, so do they go to Alice or Charlie? The only
way for the TURN server to know is to keep some state. For instance,
it can assign outgoing packets from Alice one port and packets from
Charlie a different port, so that when Bob replies it can look up
incoming port and know where to send it. If this sounds familiar, it&#39;s
because this is exactly what a NAT does and for the same reason: it
has more than one client sharing the same external IP address, in this
case the address of the TURN server. All application relays have to
do something like this, because otherwise they wouldn&#39;t be able
to talk to unmodified peers, which is a hard requirement for incremental
deployment.&lt;/p&gt;
&lt;h3 id=&quot;allocations-and-permissions&quot;&gt;Allocations and Permissions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#allocations-and-permissions&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In order for Alice to send and receive data from Bob, TURN
requires that she explicitly create state on the relay
(unlike a NAT where the state is implicitly created
by sending packets). This is done using two transactions,
&lt;em&gt;allocating an address&lt;/em&gt; and &lt;em&gt;creating a permission&lt;/em&gt;,
as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/turn-allocation.png&quot; alt=&quot;TURN allocation and permissions&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The first thing Alice does is to allocate an address (really a port,
because the server probably only has one address, or maybe one each for
IPv4 and IPv6) on the TURN server that she will be using to send and
receive packets. The TURN server replies with the address and
port that has been allocated. Alice can immediately send this entry
to peers so they know what it is.&lt;/p&gt;
&lt;p&gt;Alice can use this address to send to multiple peers, as described
above, but it&#39;s not yet associated with any individual peer. In order
to actually send packets, Alice needs to next create a permission
entry for a specific peer.  Until Alice has created a permission for a
given peer, packets to from that address will just be dropped by the
TURN server. With &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3&quot;&gt;ICE&lt;/a&gt; Alice learns peer addresses
because those peers send their candidates and then Alice would create
a permission for each candidate address before sending packets to it.&lt;/p&gt;
&lt;p&gt;Note that this is effectively an &lt;em&gt;address-independent mapping&lt;/em&gt; with
an &lt;em&gt;endpoint independent filtering&lt;/em&gt; policy: Alice uses the same
address and port to talk to everyone but the TURN server blocks
incoming packets from anyone that Alice hasn&#39;t explicitly identified.
This analogy isn&#39;t perfect because the permission is explicitly
created and Alice can&#39;t even &lt;em&gt;send&lt;/em&gt; packets to
those endpoints either before sending a permission request, but
it&#39;s close enough as a mental model. However, this isn&#39;t
&lt;em&gt;port-dependent filtering&lt;/em&gt;; the TURN server will accept packets
from any port once a permission has been created for a given
address. This produces better results with endpoints which
have address-dependent mappings.&lt;/p&gt;
&lt;p&gt;To put this all together, here is what TURN looks like as
part of an ICE transaction, showing a complete connectivity
check.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/turn-ice.png&quot; alt=&quot;TURN with ICE&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The initial part of this example is the same as the previous one:
Alice contacts the TURN server, gets an allocation, and send it
to the signaling server. That signaling server forwards it to Bob,
who sends back his own candidate. At the same time, Bob also
tries to do a connectivity check to Alice&#39;s candidate,
just as he would any other candidate. However, this fails because
Alice hasn&#39;t created a permission for Bob. Once Alice creates
that permission, then she sends her own check to Bob, which
succeeds, as does Bob&#39;s in the other direction. Note that there
is a race condition here: it&#39;s possible for Alice&#39;s permission
request to complete before Bob&#39;s connectivity check arrives,
in which case that packet would get delivered, even though
Alice hadn&#39;t send a connectivity check to Bob. Either way,
ICE will eventually succeed.&lt;/p&gt;
&lt;p&gt;You should notice that Bob doesn&#39;t need to be aware of the
fact that Alice&#39;s candidate is actually from a TURN server;
it just sends to it as if it were any other candidate.
In ICE, candidates are actually labeled by type, but
this isn&#39;t necessary for ICE to work.&lt;/p&gt;
&lt;h3 id=&quot;i-can&#39;t-believe-it&#39;s-stun&quot;&gt;I can&#39;t believe it&#39;s STUN &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#i-can&#39;t-believe-it&#39;s-stun&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Believe it or not, TURN is actually an extension for STUN:
TURN data is encapsulated in STUN packets. For instance,
you do allocation by sending a STUN message of type &amp;quot;Allocate&amp;quot;
and you send packets by sending a message of type &amp;quot;Send&amp;quot;.
This is actually not &lt;em&gt;quite&lt;/em&gt; as strange a design decision
as it might initially appear, for several reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You really really want to run TURN over UDP rather than
TCP (see &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#why-not-tcp&quot;&gt;below&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because UDP is unreliable you need some transaction
mechanism to allow the client to make requests from
the server, retransmitting those requests when lost.
STUN already has this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ICE implementations already have STUN stacks. As one
nice side effect, though the TURN server will actually
tell you your server reflexive address, so you don&#39;t
need to do a separate request to a STUN server
to learn it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If one were designing this protocol today, you would probably
base it instead on some protocol that added reliability to
UDP (e.g., QUIC), but TURN was originally designed in 2010,
so things were different back then.&lt;/p&gt;
&lt;h3 id=&quot;channels&quot;&gt;Channels &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#channels&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One real drawback of using STUN is bloat. Sending a single
packet with a Send (outgoing) or Data (incoming) indication
adds 36 bytes of overhead. Here&#39;s an example packet diagram,
based partly on the one from the STUN RFC:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
|0 0|     STUN Message Type     |         Message Length        |&#92;
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|                         Magic Cookie                          | | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Header
|                                                               | |
|                     Transaction ID (96 bits)                  | |
|                                                               | /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Type=XOR-PEER-ADDRESS |            Length=8           | &#92;
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Peer
|0 0 0 0 0 0 0 0|    Family     |         X-Port                | | Address
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|                X-Address (32 bits for IPv4)                   |/
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Type=Data             |            Length             |&#92;
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data
|                       Variable data ....                      |/
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Most of this is overhead. First, every packet has a fixed 20 byte
header, which mostly acts to identify it as STUN and tell you
what message type it is (e.g., Send indication). Then you have the
peer address and the data encoded in an inefficient tag-length-value
format.
None of this overhead really mattered for STUN&#39;s original
application, where you just sent a few messages, but when you
have to absorb it for every packet you&#39;re sending (at a rate
of maybe 20-50 per second) it adds up quickly.
The remote address and port is also sort of redundant
because there are only a few addresses in use, so you
could compress them by just sending a short address ID.&lt;/p&gt;
&lt;p&gt;TURN includes a mechanism called &amp;quot;channels&amp;quot; which does exactly
this. The client can send a request to the TURN server
to allocate a two-byte channel ID to a given remote address
and port (the same information as would be needed for a permission).
Once the channel is allocated, packets can then be sent or
received by just prefixing them with the channel ID and length,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Channel Number        |            Length             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
/                       Application Data                        /
/                                                               /
|                                                               |
|                               +-------------------------------+
|                               |
+-------------------------------+
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you&#39;re a real protocol engineering nerd, you might ask how
you distinguish a message containing channel data from a STUN
message, as they are carried on the same host/port quartet. The
answer is that STUN message types always have the first two
bits as zero and channel IDs are required to be between
0x4000 and 0x4fff.&lt;/p&gt;
&lt;p&gt;You might also wonder at this point why STUN conveniently has a range of
message types which can&#39;t be allocated: the reason is that when STUN
was designed people wanted to make sure that it could be easily
demultiplexed (i.e., distinguished) from RTP and RTCP, which always have the first bit of
the first byte set to 1.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
There has actually been quite a bit
of hackery around easily demultiplexing various types of messages
in real-time multimedia. Some of this was due to intentional
design and some was just fortuitous design choices that people—by
which I partly mean me—took advantage of. For instance, DTLS has
record types as the first byte, but these are always low numbers
and so easy to distinguish from RTP and RTCP.
At this point there are actually
five separate types of protocol message
which can be carried over the same host/port quartet:
(1) STUN (2) ZRTP (3) DTLS (4) TURN channels and (5) RTP/RTCP.
Someone had to write a whole &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7983&quot;&gt;RFC&lt;/a&gt;
to systematize how to do it.&lt;/p&gt;
&lt;h2 id=&quot;no-incoming-connections%3F&quot;&gt;No incoming connections? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#no-incoming-connections%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One side effect of the requirement to create a permission for a specific
peer address is that it is not possible to use TURN to run a generic
server behind a NAT or firewall. A typical server, such as for Web
or mail has a fixed address and port which anyone can use to connect
to it, but because TURN requires that the TURN client create a specific
permission for each peer, arbitrary clients on the Internet cannot
just connect.&lt;/p&gt;
&lt;p&gt;This limitation is not an oversight but rather a deliberate design
choice. Recall that it&#39;s common for firewalls to enforce an
&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#maintaining-nat-binding&quot;&gt;&amp;quot;outgoing connections only&amp;quot;&lt;/a&gt;
security policy. Without this limitation it would be straightforward
for clients to bypass this policy by just connecting to a
TURN server on the Internet. The TURN designers were concerned
that if TURN enabled this kind of policy bypass enterprise
administrators would respond by blocking TURN entirely (recall
from the previous section that TURN is trivial to identify.)
The idea was that if TURN could only be used for outgoing connections,
then administrators would be more likely to allow it through the
firewall.&lt;/p&gt;
&lt;h2 id=&quot;what-about-when-stun-or-udp-is-blocked%3F&quot;&gt;What about when STUN or UDP is blocked? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#what-about-when-stun-or-udp-is-blocked%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Despite the &amp;quot;no-incoming&amp;quot; compromise embodied in the permissions design,
it is still sometimes the case that STUN over UDP is blocked. The reasons for
this vary, but include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Firewalls that block all UDP traffic.&lt;/li&gt;
&lt;li&gt;Firewalls that do so-called &amp;quot;deep packet inspection&amp;quot; and block any
packets from protocols they don&#39;t recognize&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;https://storage.googleapis.com/pub-tools-public-publication-data/pdf/8b935debf13bd176a08326738f5f88ad115a071e.pdf&quot;&gt;Data&lt;/a&gt;
from the initial deployments of QUIC suggest that somewhere around 5%
of clients can&#39;t use an arbitrary new UDP-based protocol, though it&#39;s
unclear how often this is due to UDP blocking or just to blocking
unrecognized protocols.
In order to get around this kind of blocking,
it is also possible to run TURN over TCP as well as over TLS.
If you have a firewall which just blocks UDP, then running
TURN over TCP will often work. If you have a firewall which blocks unknown
protocols then running TURN over TLS&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
might work.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
The idea here is that there are other protocols that firewall
administrators want to support (e.g., HTTP or HTTPS) that run
over TCP and/or TLS and if they haven&#39;t configured their firewall
rules too strictly, then TURN may also work.&lt;/p&gt;
&lt;p&gt;It&#39;s important to understand that it&#39;s still quite easy to recognize
TURN in these situations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;By default STUN uses a different port number than HTTP&lt;/li&gt;
&lt;li&gt;If TLS isn&#39;t used you can just look at the TCP packets
to see if something is STUN.&lt;/li&gt;
&lt;li&gt;When TLS is used, the TLS &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7301&quot;&gt;ALPN extension&lt;/a&gt;
indicates that TURN is in use.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Again, this is by design and reflects an attempt to take a compromise
approach to blocking of TURN in which network operators
can block TURN if they want to but in cases where they just
configured their rules in a way that incidentally blocks
TURN (in some cases before TURN was even designed), then
TURN should work. The history of new protocol development is
full of this sort of uneasy compromise: on the one hand we
want to deploy new stuff and there are lots of network elements
which are very hostile to that, often unintentionally. On the
other hand, a situation in which the applications are just
at constant war with the administrators is a recipe for breakage.&lt;/p&gt;
&lt;p&gt;With that said, in the past few years attitudes towards network-based
blocking have changed a fair bit, including technologies like
DNS over HTTPS, QUIC, and TLS Encrypted Client Hello which are intended
to &lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/&quot;&gt;make it harder to selectively block traffic&lt;/a&gt; unless you have
control of one of the endpoints. If TURN were being designed today,
I&#39;m not sure the same choices would be made.&lt;/p&gt;
&lt;h2 id=&quot;why-not-tcp&quot;&gt;Why not TCP &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#why-not-tcp&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;While it&#39;s possible to run TURN over TCP, you really don&#39;t want to
if you can avoid it because performance will generally be bad.
Covering this topic fully is out of scope for this post
(though stay tuned for my long-delayed posts about transport
protocol performance), but here is a brief sketch to help you
build some intuition.&lt;/p&gt;
&lt;h3 id=&quot;head-of-line-blocking&quot;&gt;Head-of-line Blocking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#head-of-line-blocking&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first problem derives from the fact that TCP delivers packets
to applications in order. However, this means that if a packet
is dropped, then every packet received after that is held by
the receiving TCP implementation until that packet is received,
as shown in the following diagram:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/holb.png&quot; alt=&quot;Head of line blocking&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this case, the sender sends packet 1 which arrives at the receiver
and is delivered to the app immediately. However, Packet 2 is dropped
and so packets 3 and 4 are just buffered until Packet 2 is retransmitted,
at which point all three are delivered. For more on this topic see
my &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/&quot;&gt;introductory post&lt;/a&gt; about transport
protocols. This phenomenon is called &lt;em&gt;head-of-line blocking (HOLB)&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;HOLB is fine for applications where everything happens in order
but less good for audio and video (A/V). A/V consists of a series
of independent pieces of media, short sound snippets of 20-50ms
in the case of audio, and frames in the case of video. In order
to have a good experience, these need to be played out at regular
intervals or the media will look and/or sound choppy. Of course,
the network doesn&#39;t deliver them at exactly the right time, so
the receiving implementation delays them a little bit in
what&#39;s called a &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/%5Bhttps://en.wikipedia.org/w/index.php?title=Jitter&amp;amp;oldid=1148283466#Jitter_buffers&quot;&gt;jitter buffer&lt;/a&gt; before playing them out.&lt;/p&gt;
&lt;p&gt;The key word here is &amp;quot;a little bit&amp;quot;: media latency of more
than 200 ms or so is intensely undesirable. However, it&#39;s
not uncommon for TCP implementations to wait far longer
than this for retransmission, during which all the media
would be delayed. In these cases, it&#39;s better to just
drop the missing frame and play the next frames at the
appropriate times. Fancier implementations use
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Packet_loss_concealment&amp;amp;oldid=1165497866&quot;&gt;packet loss concealment&lt;/a&gt; techniques to fill in the missing data, but
even if you just play the next frames it&#39;s better than waiting.
With UDP, packets are delivered to the application at the time
of receipt, but the TCP logic is all in the operating system, so there&#39;s
no way to get any data until all earlier data is received.&lt;/p&gt;
&lt;h3 id=&quot;rate-control&quot;&gt;Rate Control &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#rate-control&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The second problem is that TCP is designed to adapt its sending
rate to match network conditions, in part by buffering data
until it thinks it&#39;s safe to send. The problem here is that
unless the media sender is &lt;em&gt;also&lt;/em&gt; adapting its rate to network
conditions, then it&#39;s sending data to TCP faster than it can
be transmitted, which creates buffering and/or packet loss.
Rate control for real-time protocols is a complicated topic,
but the TL;DR is that you really only want to have one rate
control regime, which should be at the media layer, and then
the network protocols just transmit whatever they are asked
to right away. Sending over TCP prevents that.
Obviously sending over TCP is better than not being able
to make a call at all, but if at all possible you want
to send your media over UDP.&lt;/p&gt;
&lt;h2 id=&quot;turn-server-deployment-scenarios&quot;&gt;TURN Server Deployment Scenarios &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#turn-server-deployment-scenarios&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In ICE, both sides will generally have TURN servers, in
which case each side will offer relayed candidates.
Depending on the properties of each network, ICE might
end up using neither relayed candidates, have one of
the sides talk directly to the other side&#39;s
relayed candidate, or have the traffic go through
both relays. In general, because TURN&#39;s mapping
and filtering model are fairly permissive, it will generally
not be necessary to go through both TURN servers
unless both sides have really unfortunate networking
configurations.&lt;/p&gt;
&lt;p&gt;Note that with WebRTC generally both sides will use the same TURN
server. When TURN was first designed, real-time communications over IP
mostly meant people with softphones or hardware IP phones. Those
devices were associated with some provider, whether it was an
enterprise system or a consumer VoIP provider. In either case, the
provider would supply the TURN server (recall from &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#relayed-candidates&quot;&gt;part
III&lt;/a&gt; that running TURN servers
isn&#39;t cheap).  If someone from provider A is calling someone from
provider B—though SIP federation was never as common as people
were hoping—then you might have a situation where
each user had a different TURN server.
By contrast, most WebRTC deployments are in settings where
there is only one provider and so everyone uses the same
TURN server.&lt;/p&gt;
&lt;p&gt;Note that most conferencing systems are deployed in a star
configuration in which each participant sends their media to
a central &lt;em&gt;media conferencing unit (MCU)&lt;/em&gt; or &lt;em&gt;switched forwarding unit (SFU)&lt;/em&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
Because these servers are both on the open Internet, it&#39;s much
less likely you will need to use a TURN server. Because
you don&#39;t need to get through a NAT or firewall on the server
side, it should work even if you have a really uncooperative
NAT. The main time you would need a TURN server in this environment
is if you were behind a firewall which blocked all media
(e.g., because it blocked UDP).  Note that if the MCU/SFU and
TURN server are operated by the same entity, there is an opportunity
to integrate them closely, though I don&#39;t know if people actually
do this.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-4/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Out of the whole IETF NAT traversal protocol suite, TURN probably feels
the oldest, even though it was designed at about the same time. It&#39;s a bespoke application relaying protocol built on top
of a protocol which was originally designed for a totally different
job, namely discovering your reflexive IP address. In the modern era,
we&#39;d probably build something fairly different and more like
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc9298.html&quot;&gt;MASQUE&lt;/a&gt;, which is
a generic UDP proxying protocol built on top of HTTP/3 and QUIC.
On the other hand, STUN and TURN are a lot simpler than QUIC,
they get the job done, and they&#39;re already built in browsers and softphones,
so I imagine we&#39;ll be using them for some time.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You could actually omit the length field as well if you
restricted yourself to UDP and only sent one packet per
UDP datagram. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The reason for the magic cookie is to ensure that it could easily
be demultiplexed from &lt;em&gt;any&lt;/em&gt; protocol, whether it had this
distinguishing first byte or not. The cookie is just a fixed
4 byte value that is at the same position in every STUN
packet. It&#39;s unlikely that it will be in the same position in
other protocols and
so helps identify STUN. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that it&#39;s not necessary to run TURN over TLS in order to
protect the media, which needs to be encrypted anyway. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s also possible to run turn over &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc9147&quot;&gt;DTLS&lt;/a&gt;,
but this isn&#39;t much more likely to work than regular TURN. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
These are different, but the difference doesn&#39;t matter for these purposes. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-4/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Broken Arrow Triple Crown Race Report</title>
		<link href="https://educatedguesswork.org/posts/broken-arrow/"/>
		<updated>2023-07-10T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/broken-arrow/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/ekr-broken-arrow-finish.jpg&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ekr-broken-arrow-finish.jpg&quot; alt=&quot;Finish photo&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This year has turned out to be light on racing in part because I was
kind of wiped out after last year and in part because I had signed up
for the &lt;a href=&quot;https://www.brokenarrowskyrace.com/&quot;&gt;Broken Arrow Skyrace&lt;/a&gt; in
Tahoe in June.
Broken Arrow isn&#39;t actually one race but a race festival
that takes place over three days. All of the races are relatively
short compared to what I usually do (the longest is nominally 46 km/29 mi, but
they offer what&#39;s called the &amp;quot;Triple Crown&amp;quot; which consists of the
following three races over three days, listed as:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Race&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Distance&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Vert&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Vertical Kilometer (VK)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;4.8 km/3 mi&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;914 m/3000 ft&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;46K&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;42.5 km/26.5 mi&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2774 m/9100 ft&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;23K&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;21.75 km/13.5 mi&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1443 m/4700 ft&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The 46K is supposed to be two loops of the 23K, but you&#39;ll notice
that the distance and vert don&#39;t quite line up and of course
the distances don&#39;t actually match the names. This is in part
because of rerouting due to the huge amount of snow that dropped
in the Sierra this summer (also preventing me from doing the
warmup adventure run in the Sierras that I had planned). In the event,
the 23K got totally rerouted on race day anyway.&lt;/p&gt;
&lt;p&gt;Anyway, naturally I decided to do the Triple Crown, both because
it sounded fun and because I wasn&#39;t really willing to drive to Tahoe
for a 46K. Also, they gave out a massive amount of swag.
My overall plan was to push the VK moderately hard, race
the 46K, and then see what I could do on the 23K.&lt;/p&gt;
&lt;h2 id=&quot;flagstaff&quot;&gt;Flagstaff &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#flagstaff&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The race start is at Palisades Tahoe (6253 ft) and goes up
from there, so you&#39;re at significant altitude the whole time.
I&#39;ve gone directly from sea level to altitude and raced before,
with mixed results (OK at Tahoe 100K, awful at Tushars 70K)
but often people actually feel worse on the second or third
day at altitude (see Corinne Malcolm&#39;s excellent &lt;a href=&quot;https://www.irunfar.com/into-thin-air-the-science-of-altitude-acclimation&quot;&gt;article&lt;/a&gt; on altitude adaptation at &lt;a href=&quot;http://irunfar.com/&quot;&gt;iRunFar.com&lt;/a&gt;), and so
I didn&#39;t want to try to race three days in a row without any
adaptation, so I decided to spend two weeks in Flagstaff
(altitude ~7000 ft) beforehand.&lt;/p&gt;
&lt;p&gt;On balance, I think this was a good choice. As usual, I felt lousy
the first few days at altitude but by the time I had been
there a couple of weeks I was feeling mostly adapted. I flew back
on Wednesday and on Tuesday, my friend &lt;a href=&quot;https://ultrasignup.com/results_participant.aspx?fname=Kate&amp;amp;lname=Hudson#&quot;&gt;Kate&lt;/a&gt;, my son
(3200m PR: 10:52), and I went to the Grand Canyon to do
the Bright Angel–Tonto–South Kaibab loop. This was a bit of
a hot dry slog on the way up, but I generally felt OK,
so I figured I was ready for Broken Arrow, which of course
is actually cold and snowy rather than hot and dry.&lt;/p&gt;
&lt;h2 id=&quot;vk-(results%2C-finish-video)&quot;&gt;VK (&lt;a href=&quot;https://www.athlinks.com/event/171438/results/Event/1053701/Course/2374409/Bib/623&quot;&gt;results&lt;/a&gt;, &lt;a href=&quot;https://www.athlinks.com/event/171438/results/Event/1053701/Course/2374409/Bib/623&quot;&gt;finish video&lt;/a&gt;) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#vk-(results%2C-finish-video)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Kate and I drove out to Tahoe Thursday morning where we were staying
with
&lt;a href=&quot;https://brbrunning.com/2023/06/24/broken-arrow-46k-at-tahoe-snow/&quot;&gt;Lisa&lt;/a&gt;
and Stephen who were both doing the 46K. Kate was doing the VK and the 23K,
so I was the only one doing the Triple Crown. We got there around 6
PM, but fortunately the race didn&#39;t start until 10 AM, so we were able
to go out and grab some pasta and still get enough sleep.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/vk.png&quot; alt=&quot;VK profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ba-vkmap.png&quot; alt=&quot;VK map&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The profile for the VK is shown above. Looks gentle, but
that&#39;s just a trick of perspective because it&#39;s stretched out;
it&#39;s actually about 1000 feet per mile.&lt;/p&gt;
&lt;p&gt;I&#39;d never done a VK before, so I wasn&#39;t sure what to expect. The
pros do it in about 30 minutes (winning time was 39) so I was
expecting an hour or so, which means you&#39;re going at a fairly
high intensity right from the start. On the other hand I knew I had to save for the
46K the next day, so it&#39;s a bit of a balancing act.&lt;/p&gt;
&lt;p&gt;The initial climb was quite steep but on trail with good footing so I
was moving pretty fast. I decided to start about midway through the
field, which in retrospect was a bit of a mistake, as I immediately
had to make my way through people moving slower than me.
I was of course hiking at this point, but so was basically
everyone else.
Quickly, though, the climb turned into a snow slope,
where things were quite a bit more challenging. At this point
in the day, the snow was already quite slippery and even with
poles(&lt;a href=&quot;https://www.leki.com/int/en/Ultratrail-FX.One-Superlite/65225841120&quot;&gt;LEKI Fx.One Superlight&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/broken-arrow/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;),
I slipped a fair bit. The trick seems to be to step where others have
stepped, where the snow is packed and you have a little more traction.
It&#39;s very hard to pass people on this section because there are only
a few lines up the slope and if you get outside the packed down
areas you&#39;re slipping a lot. There were a couple places where
super helpful volunteers had carved out snow steps and those were
a lot easier.&lt;/p&gt;
&lt;p&gt;Once you get over the first climb, there&#39;s a downhill of about half a
mile, starting with snow and then moving onto rocky trail. This was
the first part of the race where you had to run downhill on snow. I
was a bit unstable and managed to trip and fall on the transition
to dirt, jamming my 2nd and 3rd fingers on the left hand (but
fortunately not breaking either of them like I did to my right 3rd
finger in the Grand Canyon at the beginning of May).&lt;/p&gt;
&lt;p&gt;From there on it&#39;s another climb mostly on trail until you drop
off on a sort of fire road. I passed quite a few people on this stretch
as the footing was good and so it&#39;s just a matter of your ability
to power up the climb, something I&#39;m good at. After the
fire road, there&#39;s maybe 400 m of a fairly rocky (as in almost scrambling)
traverse, at which point you get the the &amp;quot;stairway to heaven&amp;quot;,
which is this sketchy looking metal ladder that you
really do not want to fall off of:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/ekr-broken-arrow-ladder.jpg&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ekr-broken-arrow-ladder.jpg&quot; alt=&quot;Broken Arrow Ladder&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There was actually a bit of a backup at the ladder and I had to
wait for some others to get over it. In retrospect I should have stowed
my poles at this point because they get in the way of climbing
and the finish is right after the ladder.&lt;/p&gt;
&lt;p&gt;The ladder is obviously single file, and so at this point
I figured the finish order was fixed, but there are actually
some snow steps and a short flattish stretch of snow before the
finish and someone passed me right after the steps before I
realized I should sprint, which I tried to do, which resulted in
slipping and falling again, but I eventually made it to the line.&lt;/p&gt;
&lt;p&gt;Unlike other races, however, the VK just finishes at the top of the
hill so there&#39;s not much of a finish line, just the arch and a few
race staff standing around to give you your medal. Even the finish
line drop bags are about a half mile away. I opted to wait around for
Kate to finish, but I hadn&#39;t brought a jacket and it was super windy,
so when she got to the top I was getting cold. We
then headed down to the drop bags at the &amp;quot;Siberia&amp;quot; aid station
to get our drop bags with jackets. From there it&#39;s about a mile to the top of
the gondola for the ride down.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/broken-arrow/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;That afternoon, Tailwind Nutrition was having
a &amp;quot;meet and great&amp;quot; with ultra great &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Courtney_Dauwalter&amp;amp;oldid=1163248121&quot;&gt;Courtney Dauwalter&lt;/a&gt; to introduce their new Courtney-inspired flavor
&lt;a href=&quot;https://tailwindnutrition.com/products/limited-edition-endurance-fuel-dauwaltermelon&quot;&gt;Dauwaltermelon&lt;/a&gt;.
Back when I did Tahoe 100K in 2018, while my family was waiting for
me at the finish line, Courtney rolled through en route to
her &lt;a href=&quot;http://trailandultrarunning.com/courtney-dauwalter-crushes-tahoe-200-course-records-with-2nd-place-oa-finish/&quot;&gt;second place overall at Tahoe 200&lt;/a&gt;, and spent a few minutes talking
to my then 11 year old son, which he found really inspiring,
so I got a chance to thank her for that. Courtney went on
to absolutely shatter the women&#39;s Western States
Endurance Run record the next weekend.&lt;/p&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/kate-courtney.jpeg&quot; /&gt;
&lt;h4 style=&quot;text-align: center&quot;&gt;
Kate and Courtney talking about ultra
&lt;/h4&gt;
&lt;p&gt;
&lt;/p&gt;&lt;p&gt;I had brought a pair of the Kahtoola
&lt;a href=&quot;https://kahtoola.com/traction/nanospikes-footwear-traction/&quot;&gt;NANOspikes&lt;/a&gt;
for the snow but didn&#39;t use them, in part because it never got
super bad and in part because I didn&#39;t want to take the
time to put them on. However, the trip down to the gondola was mostly snow
so I did try them out and they seemed to help a bit, though
they&#39;re Kahtoola&#39;s lightest and shortest spikes and the snow
was about 6 inches deep, so they&#39;re not magic.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overall:&lt;/strong&gt; 1:07:02, 142/395 finishers, 7/39 M50-59&lt;/p&gt;
&lt;h2 id=&quot;46k-(results)&quot;&gt;46K (&lt;a href=&quot;https://www.athlinks.com/event/171438/results/Event/1053701/Course/2374411/Bib/2993&quot;&gt;results&lt;/a&gt;) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#46k-(results)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The 46K was on day two and my plan was to push the pace a bit
and then try to hang on for day 3.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/46k.png&quot; alt=&quot;46K profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ba-46kmap.png&quot; alt=&quot;46K map&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As I said earlier, this is two loops, arranged as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A runnable rolling but gradually uphill section, partly
on the Western States Trail.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A series of steep climbs on dirt and snow up to the
Snow King aid station.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A semi-rocky traverse followed by a climb up to KT-22
where it rejoins the VK course.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;From the top of the VK course there&#39;s a gradual descent
on snow followed by a series of very steep descents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A climb of about a quarter mile and 400 feet, again
on snow.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A fast descent of about 1.5 miles on snow, followed by
a mile on dirt road back more or less to the start.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And then you do it all over again. Simple. I didn&#39;t really know
what to expect on this timewise, but I was thinking something
like 7 hours.&lt;/p&gt;
&lt;p&gt;After the VK, I was kind of worried about traction, so on Friday afternoon
I dropped by &lt;a href=&quot;https://www.alpenglowsports.com/&quot;&gt;Alpenglow Sports&lt;/a&gt; and
bought a pair of the slightly more aggressive &lt;a href=&quot;https://kahtoola.com/traction/exospikes-footwear-traction/&quot;&gt;Kahtoola EXOspikes&lt;/a&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/broken-arrow/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
They&#39;re not that heavy and I figured I could carry them in my pack. Lisa was
also doing the 46K and broke the rule about not buying new stuff for a race
to get a pair of purple Hoka Torrents.&lt;/p&gt;
&lt;h3 id=&quot;lap-1&quot;&gt;Lap 1 &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#lap-1&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;After having to fight my way through people on the VK, I decided to start out
more towards the front. This turns out to have been a good plan because
you first run across a parking lot and then there is a short section of
fire road for a total of maybe 400 m and then you&#39;re into single track,
so there was kind of a rush for position. I hadn&#39;t really warmed up—I
usually don&#39;t before ultras as you can just warm up in the first few miles—and so I probably wasn&#39;t as fast as I
should have been and things got bunched up in the single track.
It didn&#39;t help that there was a low of snow runoff so you were literally
running through a stream a lot of the way (no chance of keeping your feet
dry!). Eventually I settled into my position, as usual being passed some on the
downhills and passing people on the climbs.&lt;/p&gt;
&lt;p&gt;After about 3.5 miles, you hit the first climb, which is a steep dirt
section, so it was time to pull out the poles. The &amp;quot;trail&amp;quot; part of this
climb was pretty rough anyway, so it didn&#39;t make much difference if you
took a slightly different line and I pulled to the left of the line
of climbers and passed a number of people en route to the top. After
this, it&#39;s another climb mostly on snow up to the Snow King aid station,
where I made my first mistake of the day.&lt;/p&gt;
&lt;p&gt;As I mentioned, I had broken my finger in the Grand Canyon about 6
weeks before and while I was finally out of a splint, I was still
supposed to &amp;quot;buddy tape&amp;quot; the broken finger to the next finger. Anyway,
I&#39;d started out wearing gloves but it was starting to get hot and
so I wanted to take them off, but then I had to retape the finger
and the coban I had been using didn&#39;t want to re-stick once it got wet, so I had to
get one of the medics to do it with some medical tape. All of this
must have taken like 3-5 minutes and I know a lot of people passed
me. As they say, when you&#39;re stopped you&#39;re going infinity
minutes per mile.&lt;/p&gt;
&lt;p&gt;From Snow King it&#39;s a short downhill followed by a bunch of up and
down (but mostly up), including a knife edge traverse over a bunch of
scree. I took this really tentatively and a bunch of people passed
me, but after the Canyon I was mostly focused on making sure I didn&#39;t
fall and hurt anything, so I was willing to live with it. The climb up
to KT-22 is steep and rocky, so I started passing people again.&lt;/p&gt;
&lt;p&gt;From here it&#39;s the VK course and once I hit the snow traverse I decided
it was time for the spikes. They&#39;re easy to get on, so it probably
only took a minute or two. I do think this helped some as I felt like I
was passing some people who were slipping, but it wasn&#39;t dramatic the
way (I imagine) it would be with crampons. Everything was smooth
to the top of the VK and I felt a bit more comfortable on the ladder
this time, though I wasn&#39;t looking forward to having to do it two more
times (the next loop and then the 23K).&lt;/p&gt;
&lt;p&gt;The descent from the top of Washeshu Peak starts out
straightforward: it&#39;s rock and then snow, but then
right when I was expecting a nice flattish descent down to the
gondola (and then what? not sure) there was a marshal telling me to take a
left turn onto, well, I guess you&#39;d call it a slope, but it
was straight down and I remember saying something to the effect of
&amp;quot;holy shit&amp;quot;. The whole slope is something like -15%, and was
about mid-calf deep in snow, so I spent the first part of it
just desperately trying not to fall until I saw some of the
chutes where people had been glissading. I took the hint and sat
down and sledded down them (cold!). This got me to the bottom
pretty fast and then I turned and saw something else I wasn&#39;t
expecting: a 400 foot climb.
I trudged up the climb, which actually wasn&#39;t so bad and then it&#39;s a
short downhill to the aid station. I stopped and took off my spikes,
as they didn&#39;t seem to help much on the snowy downhill, and
I never used them again.
This whole
section was also deepish snow for another 1.5 miles or so
and then it was onto fire road back to the start.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Split: 3:11:09&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id=&quot;lap-2&quot;&gt;Lap 2 &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#lap-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I wasn&#39;t feeling real good about having to do all this again,
but I blew through the half-way aid station (split 1: 3:11:24) and headed
back out for loop 2. There was definitely more power hiking on the
Western States Trail this
time, but I still managed to run a fair bit of it. By the time
I got to Snow King again I was quite tired and was glad to see
that they had Coke (caffeine + sugar = performance) which I used
to fill up one of my bottles.&lt;/p&gt;
&lt;p&gt;Once I got past Snow King, this loop seemed a lot easier,
probably due to some combo of the caffeine and knowing that I
was over halfway done. Also, as mentioned above, I&#39;m a lot better
on the steep climbs than I am on descents, so once we got
past the opening rollers, I knew I just needed to push through
those sections fairly hard and then survive the downhill.
I did spend some time talking to one of the other runners who
was doing her first trail race but had been a collegiate 10K runner and had done a lot
of mountaineering and she gave me some tips on how to descend in
the snow (heels first!), which seemed to help some.&lt;/p&gt;
&lt;p&gt;Things were pretty uneventful from here: I made it to the top
and felt a lot more comfortable on the glissading portions
and on final the snowy downhill. I didn&#39;t need the poles on
the downhill but at this point my coordination was starting to
go and I couldn&#39;t quite get them into the quiver (the typical
thing is that one end doesn&#39;t quite make it in), so I ended
up just folding them and carrying them.
By the time I hit the fire
road I was mostly alone so I settled in at a comfortable
but not all out pace, remembering that I had to race again on
Sunday. Coming through the final stretch to the finish I just
focused on trying to finish strong.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Split:&lt;/strong&gt; 3:31:26&lt;/p&gt;
&lt;p&gt;I had a bit of time before Lisa and Stephen finished, so I decided
to go back to the VRBO and shower and change, but still made it
back in time.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ba-finish-all.jpg&quot; alt=&quot;Us at the Broken Arrow finish&quot; /&gt;&lt;/p&gt;
&lt;h4 style=&quot;text-align: center&quot;&gt;
All of us after the finish of the 46k
&lt;/h4&gt;
&lt;p&gt;
&lt;/p&gt;&lt;h3 id=&quot;analysis&quot;&gt;Analysis &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#analysis&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Segment&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Overall&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Division&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Snow King&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;136&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;103&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Siberia&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;175&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;130&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;High Camp&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;177&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;135&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Village&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;184&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;139&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Snow King&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;169&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;128&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;High Camp&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;155&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;117&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Finish&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;167&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;127&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The chart below tells about the pattern you would expect from
the narrative about (though I hadn&#39;t actually looked at the
chart before I wrote it.) Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;I was doing well on the climbs but badly on the downhills.&lt;/li&gt;
&lt;li&gt;I lost a lot of time screwing around at Snow King. Several
of people who were ahead of me passed between Snow King and Siberia
on the first loop.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With that said, things were tight: 4th was 6:25:27 (17
minutes behind me) and I was less than 10 minutes behind 7th.
It&#39;s possible I went out a bit hard and faded, but my sense is
I was actually stable and that I ran a solid, but conservative
race. Probably the biggest loss is between High Camp and the Finish
on the last downhill, where if I&#39;d just been better on snow I
might not have lost as much time or place.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overall:&lt;/strong&gt; 6:42:35, 167/542, 9/46 M50-59&lt;/p&gt;
&lt;h2 id=&quot;23k-(results%2C-finish-video)&quot;&gt;23K (&lt;a href=&quot;https://www.athlinks.com/event/171438/results/Event/1053701/Course/2374413/Bib/2109&quot;&gt;results&lt;/a&gt;, &lt;a href=&quot;https://www.athlinks.com/event/171438/results/Event/1053701/Course/2374413/Bib/2109&quot;&gt;finish video&lt;/a&gt;) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#23k-(results%2C-finish-video)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;My initial plan for the 23K had just been to kind of hold on, but
given that I actually felt OK after the 46K, I knew the
course, and the 46K start time was fairly late (8:00) so I could get some
rest my coach &lt;a href=&quot;https://www.instagram.com/emilyharrison0708/&quot;&gt;Emily
Torrence&lt;/a&gt; and I decided
it was worth going for it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/23k.png&quot; alt=&quot;23K Profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ba-23kmap.png&quot; alt=&quot;23K Map&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Kate and I lined up at the start only to
hear the RD announce that because of very high winds at the summit
they were rerouting the course from the original 23K loop to be
twice the 11K loop and that they would be starting the race at 9:30
to give them time to set things up.
In retrospect we should have just gone back to the VRBO to chill
out, but instead we ended up just sitting in chairs out front of
one of the local restaurants for the next 90 minutes.&lt;/p&gt;
&lt;p&gt;Eventually, though, we lined up at the start. The 11K course followed
some of the same sections of the WS trail but skipped a bunch of the
rollers in favor of the climb to KT22 and then a fast descent on snow
back down to the road, then to the finish and repeat. Given the
46K experience, I figured it was a good idea to start near
the front and push the pace at the beginning so I didn&#39;t have to fight
past too many people.&lt;/p&gt;
&lt;p&gt;The first loop went quickly (only 10K afer all). After the first mile you&#39;re basically
climbing the entire time up to KT22 and then it&#39;s straight back down.
The downhill snow section was steep and slippery with
fewer snow chutes on this course so I mostly had to just try to
stay on my feet and get down as fast as possible.
After the 46K I felt a lot more comfortable with the glissading
this time and managed to navigate it reasonably well. Then it was onto
the road and the second loop.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Split:&lt;/strong&gt; 1:18:01&lt;/p&gt;
&lt;p&gt;With only 11 km (officially, it was really more like 10 km, though ~2400 ft),
to go in the weekend, I felt like it was safe to push the pace more on the
last lap, and I ran more of the trail portions. Of course, I still had to
hike the main climb, but really let myself take some chances on the
final snow descent (full send!).
The final mile long stretch of road is moderately steep and while
I pushed the pace as fast as I felt comfortable consistent with being reasonably
sure I
wouldn&#39;t fall, two men and one
woman passed me on this stretch. I was able to keep one of them—a man
in a red shirt that I&#39;d been back and forth with all day—in sight
but the other two dropped me.&lt;/p&gt;
&lt;p&gt;At the bottom of the road the course turns flattish and then there
are a few turns and then into the shoot. As soon as I hit this section
I knew that it was more about power than about the ability to run downhill
and I could see that I was gaining on the man in red in front of me,
and I eventually caught him right as we entered the chute. I was actually
expecting a sprint finish as I went on by, but he didn&#39;t
respond so I ended up comfortably beating him by five
seconds.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overall:&lt;/strong&gt; 2:38:58, 169/671, 5/56 M50-59&lt;/p&gt;
&lt;h3 id=&quot;analysis-2&quot;&gt;Analysis &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#analysis-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Overall I think this was my best race of the three both in terms of
results and how I felt: my place was highest both overall and in my
division and I almost felt stronger going into lap 2 than lap 1,
and this is confirmed by the even splits. I&#39;m still doing a lot
better on the climbs than the descents, but that gap seems to have
narrowed from the 46K. You always look a bit worse in the finish
videos than you feel inside, but I&#39;m moving well and passing
people at the very end is generally good.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Segment&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Overall&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Division&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Snow King&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;164&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;34:15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Village&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;183&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;8&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1:18:01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Snow King&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;160&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;4&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1:53:38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Finish&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;169&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2:28:58&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&quot;overall&quot;&gt;Overall &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/broken-arrow/#overall&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Broken Arrow also keeps Triple Crown standings, computed by the
sum of all your times. This tends to really overweight the 46K,
where I was just OK, but even so my result isn&#39;t bad. I was
37/100 overall and 4th/17 in M50-59, with a time of 10:28:38.
Third was 10:22:15, which seems plausibly in reach if things
had turned out differently.&lt;/p&gt;
&lt;p&gt;Generally, this seems like a successful weekend. I had never had
three days of racing before and was worried that I would be super
tired but I seem to have gotten stronger as the weekend went
on and wasn&#39;t even that tired after the 23K. I attribute this
to a combination of a strong training block right before—including
the two weeks in Flagstaff—and really paying attention to
nutrition and recovery post-race on Friday and Saturday.
The snow was definitely a real obstacle and I clearly would have
been quite a bit faster if I&#39;d had more practice on snow, but
I felt like I got the hang of it after a few days and while
people were still passing me it wasn&#39;t anywhere near as bad.
I think I also handled nutrition well both during the race
and after: I never had much GI distress (thanks, &lt;a href=&quot;https://www.maurten.com/&quot;&gt;Maurten!&lt;/a&gt;) and
only felt bonky a bit midway through the 46K, which Coke
fixed up. That may also have just been the &amp;quot;I&#39;ve got to do this
loop another time???&amp;quot; feeling.&lt;/p&gt;
&lt;p&gt;I&#39;m not sure if I&#39;d do Broken Arrow again: it&#39;s a generally well-run
event and I had a good time, but I think on balance I more gravitate
towards the longer events, especially those where you&#39;re covering a
lot of ground rather than repeating the same part of the course.
On the other hand, it was a great experience and I definitely
recommend giving it a shot if you&#39;ve been mostly racing standard
trail ultras.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;P.S. I&#39;d been having some trouble with the
engagement on my poles and the LEKI guys at the expo just
swapped out the gloves. Great customer service. &lt;a href=&quot;https://educatedguesswork.org/posts/broken-arrow/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The Web site actually says you might need to run down,
but that didn&#39;t happen. &lt;a href=&quot;https://educatedguesswork.org/posts/broken-arrow/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I also tried on a pair of the &lt;a href=&quot;https://www.nnormal.com/en_US/content/kjerag&quot;&gt;NNormal Kjerags&lt;/a&gt;.
I&#39;ve been looking for a new pair of race shoes and I&#39;d heard good things about
the Kjerags, but they&#39;re way too wide in the forefoot for me. This was actually kind of
surprising, because NNormal is a partnership between Kilian Jornet and Camper
and the shoes that Salomon made for Kilian were all narrow. &lt;a href=&quot;https://educatedguesswork.org/posts/broken-arrow/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>How NATs Work, Part III: ICE</title>
		<link href="https://educatedguesswork.org/posts/nat-part-3/"/>
		<updated>2023-07-02T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/nat-part-3/</id>
		<content type="html">&lt;p&gt;The Internet is a mess, and one of the biggest parts of that mess
is &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_address_translation&amp;amp;oldid=1147533294&quot;&gt;Network Address Translation (NAT)&lt;/a&gt;,
a technique which allows multiple devices to share the same
network address. This is part III in a series
on how NATs work and how to work with them.
In &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1&quot;&gt;part I&lt;/a&gt; I
covered NATs and how they work, and &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2&quot;&gt;part II&lt;/a&gt;
covered the basic concepts of NAT traversal.
If you haven&#39;t read those posts,
you&#39;ll want to go back and do so before starting this one,
which describes the main standardized technique for NAT
traversal, &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8445&quot;&gt;&lt;em&gt;Interactive Connectivity Establishment (ICE)&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As you may recall from &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2&quot;&gt;part II&lt;/a&gt;, there
are many circumstances where two endpoints (clients) want to
communicate directly rather than through a server. However, your
typical Internet client is also behind a NAT or firewall, which
means that you can&#39;t just publish your address and have people
connect to you as they would with a Web server. Instead, you
need some NAT traversal mechanism. When the IETF originally
set out to address the problem of NAT traversal, the idea was
that you would &lt;em&gt;characterize&lt;/em&gt; the NAT (i.e., figure out what
its behavior was) and use that information to publish an
address that would work via a signaling server.
Once each side has the other side&#39;s address, it can try to transmit
to it, as in the diagram below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-ei-ei.png&quot; alt=&quot;Simple NAT traversal&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Unfortunately,
that there was too much diversity in NAT behavior to make this
work reliably, so we needed something else. Enter ICE.&lt;/p&gt;
&lt;h2 id=&quot;multiple-addresses&quot;&gt;Multiple Addresses &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#multiple-addresses&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Recall that the client will generally have multiple addresses,
as shown in the diagram below &lt;em&gt;[Updated for clarity 2023-07-02]&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/NAT-addresses.png&quot; alt=&quot;Address types&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this case, the client has two addresses:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;The &lt;strong&gt;host&lt;/strong&gt; address (10.0.0.3:1111)&lt;/dt&gt;
&lt;dd&gt;which is the one assigned to its own network interface and which
it is directly aware of.&lt;/dd&gt;
&lt;dt&gt;The &lt;strong&gt;server reflexive (srflx)&lt;/strong&gt; address (192.0.2.1:5678)&lt;/dt&gt;
&lt;dd&gt;on the outside of the NAT. The client can typically only learn this by connecting
to the STUN server and asking it what address it sees.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Now what happens if two clients with this kind of topology
want to talk to each other. There are two main scenarios,
as shown in the diagram below.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The clients can be on different networks (probably the
normal case on the Internet)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The clients can be on the same network (as is common in
Enterprise or gaming scenarios, for instance if
you have multiple players in the same house and hence
the same network)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;img-flex-equal&quot;&gt;
  &lt;div&gt;
    &lt;h4 style=&quot;text-align: center&quot;&gt;
    Clients on different networks
    &lt;/h4&gt;
    &lt;img src=&quot;https://educatedguesswork.org/img/NAT-different-network.png&quot; /&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;h4 style=&quot;text-align: center&quot;&gt;
    Clients on the same network
    &lt;/h4&gt;
    &lt;img src=&quot;https://educatedguesswork.org/img/NAT-same-network.png&quot; /&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The reason that this matters is that &lt;em&gt;neither&lt;/em&gt; the host
address nor the server reflexive address will work all
the time. For obvious reasons, if Alice and Bob are
on different networks and Alice sends Bob
her host address, Bob won&#39;t be able to address it from
his own network (in this case, they actually share
the same address range, but those addresses are actually
on different networks, so there might be another host
with Alice&#39;s address on Bob&#39;s network). On the other
hand if they are on the same network and Alice sends
Bob her server reflexive address, this may not work
if the NAT doesn&#39;t support &lt;a href=&quot;https://educatedguesswork.org/posts/NAT-part-2#hairpinning&quot;&gt;hairpinning&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What you want is for the media to take different paths
(shown in red) depending on the topology: if Alice
and Bob are not &lt;em&gt;[corrected, 2023-07-02]&lt;/em&gt; on the same network, the media should
flow between the server reflexive addresses (on the
outside of the NAT) and if they are on the same network
it should flow between the host addresses (on the local
network interfaces). The problem is determining which of
these address pairs to use, because it&#39;s not practical
to determine which scenario you are in.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
If neither address is guaranteed to work, the only option
is for each side to send &lt;em&gt;both&lt;/em&gt; addresses. In this case, Alice would
send Bob two addresses (ICE calls these &amp;quot;candidates&amp;quot;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;1.0.0.3:1111&lt;/code&gt; (host)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;192.0.2.1:1234&lt;/code&gt; (server reflexive)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Bob would send Alice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;1.0.0.2:1111&lt;/code&gt; (host)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;198.51.100.1:5678&lt;/code&gt; (server reflexive)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once Alice sees Bob&#39;s addresses, she tries to transmit to
both of them, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ice-simple.png&quot; alt=&quot;Alice&#39;s connectivity checks&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this case, Alice and Bob are on different networks, so
Alice&#39;s attempt to transmit to Bob&#39;s host candidate (&lt;code&gt;10.0.0.2:1111&lt;/code&gt;)
doesn&#39;t work, but her attempt to transmit to his server
reflexive candidate (&lt;code&gt;198.51.100.1:5678&lt;/code&gt;) does, though
it goes through two layers of translation along the way.
If we drew Bob&#39;s side of the exchange, it would look
similar.&lt;/p&gt;
&lt;p&gt;If you look at this diagram closely, you will notice
something potentially surprising: Alice only sends two
packets, even though their are four pairs of addresses
(host/host, host/server reflexive, server reflexive/host, and
server reflexive/server reflexive). Why doesn&#39;t
Alice try to send from her server reflexive address? The
answer is that there is no way for her to do so. Alice can
only send packets from her host address: if they
go through the NAT, it will translate them into the server
reflexive (or maybe some other address) and if they
don&#39;t go through the NAT they won&#39;t be translated, but
Alice can&#39;t control this. In either case, Alice just
needs to send one packet to each address from the other
side.&lt;/p&gt;
&lt;h2 id=&quot;connectivity-checks&quot;&gt;Connectivity Checks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#connectivity-checks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Sending to both of Bob&#39;s addresses lets Alice get traffic
through, but we obviously don&#39;t want to have to send two
copies of every packet (or worse, if Bob has more addresses,
as discussed below). What we need is a mechanism for Alice
to determine which of the packets got through and then
she can only send on that address pair. As you might
expect if we read my post on &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/&quot;&gt;reliable transports&lt;/a&gt;,
we do this by having Bob &lt;em&gt;acknowledge&lt;/em&gt; Alice&#39;s packet in
what&#39;s called a connectivity check.&lt;/p&gt;
&lt;p&gt;Instead of sending media to Bob, Alice sends a STUN check&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
to Bob (much like she would if she were trying to learn
her address from a STUN server) and waits for the response.
If Bob doesn&#39;t answer, she can infer that that address
pair won&#39;t work. If he does, then she knows that this is
a valid address pair and can then use it to send media
(Alice knows which checks worked and which ones didn&#39;t because
the check and the acknowledgment contain an identifier,
which I haven&#39;t shown in the diagram to keep things simple).&lt;/p&gt;
&lt;p&gt;This process is shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/connectivity-check.png&quot; alt=&quot;A simple ICE connectivity check&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I&#39;m obviously simplifying quite a bit here. In particular,
because packets can get lost, Alice has to retransmit her
STUN checks for a while; otherwise a single packet on a valid
address pair might get lost. For instance, if packet 2 got lost,
and Alice didn&#39;t retransmit, then Alice would be left with
no valid pairs.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Moreover, as discussed in the next section, there are reasons
besides network failure why one of the packets might be dropped.&lt;/p&gt;
&lt;h3 id=&quot;bidirectional-checks&quot;&gt;Bidirectional checks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#bidirectional-checks&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;First, as discussed in &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#eim%3Aapf-%E2%86%94-eim%3Aapf&quot;&gt;part II&lt;/a&gt;,
if Bob doesn&#39;t transmit at all but just responds to Alice&#39;s checks,
then Alice&#39;s checks may never get through. If Bob&#39;s NAT has
address/port-dependent filtering, then it will drop any
incoming packets on a given NAT binding until Bob has sent
an outgoing packet; this requires Bob to initiate his own
checks, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/connectivity-check-bidi.png&quot; alt=&quot;Bidirectional connectivity checks&quot; /&gt;&lt;/p&gt;
&lt;p&gt;To walk though this a bit, Alice starts by sending a check (msg 1)
but because Bob has address/port filtering NAT, it filters out
the packet. When Bob initiates his own check (msg 2), it creates a binding
on his own NAT on the way out and gets delivered to Alice (this works
even if Alice also has address/port dependent filtering because
her outgoing packet created a binding). Alice receives the packet
and sends an ACK (msg 3) which is able to traverse Bob&#39;s NAT because
of the aforementioned binding. At this point, Bob knows that the
pair &lt;code&gt;B:b -&amp;gt; X:x&lt;/code&gt; works and that it&#39;s safe to transmit on that address pair.&lt;/p&gt;
&lt;p&gt;When Alice&#39;s client retransmits its check (msg 4) it is able to
get through Bob&#39;s NAT (again because of the outgoing binding created
by message 2). Bob receives it and sends an ACK, and at this point
Alice knows that the pair &lt;code&gt;A:a -&amp;gt; Y:y&lt;/code&gt; works and it&#39;s safe to transmit
on it. Note that this would have worked perfectly well if Bob had
transmitted first (just flip the diagram around), and of course each
side is retransmitting anyway.&lt;/p&gt;
&lt;p&gt;At this point you might ask why Alice needs to do a second round of
connectivity checks after receiving; after all, she knows that Bob can
successfully transmit on the &lt;code&gt;Y:y -&amp;gt; X:x&lt;/code&gt; path and she can receive it.
However, she does not know that messages on the return path
(&lt;code&gt;X:x -&amp;gt; Y:y&lt;/code&gt;) work. For instance, Bob might have a firewall
that blocks &lt;em&gt;all&lt;/em&gt; incoming UDP packets, in which case Alice&#39;s
ACK would be blocked (which she wouldn&#39;t learn about) as well
as her own connectivity checks. If she sends her own checks, then
she will learn that that path doesn&#39;t work and can try something
else. In practice, however, this scenario is reasonably uncommon
and it&#39;s quite likely that when Alice received Bob&#39;s check that her
check in the reverse direction will also work.&lt;/p&gt;
&lt;h3 id=&quot;relayed-candidates&quot;&gt;Relayed Candidates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#relayed-candidates&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As mentioned in &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2&quot;&gt;part II&lt;/a&gt;, there are situations
in which it is not possible for Alice and Bob to directly
send traffic to each other, for instance if both of them
have NATs with address-dependent mapping. In that case, getting
a successful connection requires using a &lt;em&gt;relay&lt;/em&gt;,
which is just a public server on the Internet that will
forward traffic to and from a machine, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ICE-relay.png&quot; alt=&quot;Relayed connection&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In standard ICE,
clients speak to the relay over a protocol called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Traversal_Using_Relays_around_NAT&amp;amp;oldid=1115742687&quot;&gt;Traversal Using Relays Around NAT (TURN)&lt;/a&gt;. Because the
TURN server is on the public Internet and not behind
a firewall or NAT, it will almost always be possible
for the client to connect to it—assuming that
it&#39;s possible for the client to connect to any
other network element at all. Note, however, that
the client may have to use TCP if the local network
blocks UDP.&lt;/p&gt;
&lt;p&gt;It&#39;s quite cheap to run a STUN server because it just has to respond
to a small number of packets per client, and there are a number of
free public STUN servers.  However, a TURN server has to be able to
relay &lt;em&gt;all&lt;/em&gt; of the media between the clients, which can be quite a bit
of bandwidth. For this reason TURN servers are usually not free but
rather are provided by the calling service people are using. Because
a modest fraction (single digit percentages) of people cannot connect
without a TURN server, this means that there is a certain minimum
cost to running a video calling service even if you prioritize
peer-to-peer media.&lt;/p&gt;
&lt;h3 id=&quot;picking-the-best-path&quot;&gt;Picking the best path &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#picking-the-best-path&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At a high level, then, there are (at least) three potential paths
data can take between Alice and Bob, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ICE-paths.png&quot; alt=&quot;ICE paths&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s also quite possible that there will be multiple viable paths. As
noted above, a path through a relay will almost always work, but it&#39;s
also quite common that it&#39;s possible to have a direct path between
Alice and Bob.&lt;/p&gt;
&lt;p&gt;These paths are not all created equal.  Latency is a key performance
property for real-time voice and video.  If the delay between you
speaking and the other side hearing you is too long it creates a
really jarring experience. If you&#39;ve ever been on such a call you may
have noticed that you and the other person end up interrupting each
other a lot because the pauses in the conversation that leave room for
the other person to talk get delayed as well, with the result that
both people try to talk at the same time. In general, shorter (fewer hops)
network paths will have better latency, both because more hops
will often mean more meters of cable/fiber to traverse and because the
hops themselves take time.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
In particular, if you can send media directly rather than going through a relay, you really want to
do that, both for performance and cost reasons.&lt;/p&gt;
&lt;h2 id=&quot;lots-of-candidates&quot;&gt;Lots of Candidates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#lots-of-candidates&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is really the simplest possible scenario. In practice the client
might have many more addresses. For instance, the client might have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Both a WiFi interface and a mobile phone interface, each of which
will have their own address.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Both IPv6 and IPv4 addresses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A VPN, which has its own address.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Multiple NATs between it and the Internet (e.g., if it is served
by a carrier grade NAT), each of which will have its own server
reflexive IP addresses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On or more &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#relays&quot;&gt;relayed&lt;/a&gt; connections through
TURN relays.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What ICE does is (approximately) to try the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Cartesian_product&amp;amp;oldid=1153352463&quot;&gt;combination (Cartesian product)&lt;/a&gt;
of all of the candidates from Alice and all of the candidates from Bob until it
identifies a set of candidates that work (the &amp;quot;valid set&amp;quot;). Of course,
some candidate pairs will not be possible (e.g., mixed IPv4 and IPv6),
but it&#39;s still possible to have quite a few compatible candidates and hence
quite a few candidate pairs. As a concrete example, the machine I
am writing this on has two interfaces (wired and wireless),
each with local IPv4 and IPv6 addresses, but not IPv6 connectivity,
so that gives me 4 host candidates, 2 server reflexive candidates (for v4 only), plus
at least one relayed candidate. If I&#39;m connecting to another similar machine,
we&#39;re potentially looking at something like 15 IPv4 pairs (remember, you don&#39;t pair up the
server reflexives locally) plus 4 IPv6 pairs. It&#39;s a lot!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/buzz-lightyear-candidates.jpg&quot; alt=&quot;Buzz Lightyear meme&quot; /&gt;&lt;/p&gt;
&lt;h3 id=&quot;peer-reflexive-candidates&quot;&gt;Peer-Reflexive Candidates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#peer-reflexive-candidates&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You may recall from &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/&quot;&gt;part II&lt;/a&gt;
that some NATs have &lt;em&gt;address and port-dependent&lt;/em&gt; mappings, in which
case the candidate gathering process will find a different external
mapping (the &lt;em&gt;server reflexive address&lt;/em&gt;) for a given internal address/port than is observed by the
peer (the &lt;em&gt;peer reflexive address&lt;/em&gt;). What this looks like to the peer
is that it receives a check from an address that it doesn&#39;t have
a candidate for. Fortunately, there is enough information in the
STUN check to determine what is going on, and the endpoint responds
by synthesizing a remote peer reflexive candidate, pairing it to its
local candidate, and starting checks to it. The other side doesn&#39;t
have to do anything special here, because—as with server reflexive
candidates—it automatically sends requests from the peer reflexive
address just by sending to the peer.&lt;/p&gt;
&lt;h2 id=&quot;prioritizing-checks&quot;&gt;Prioritizing Checks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#prioritizing-checks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A naive implementation of ICE would just send all the connectivity
checks at the same time. This turns out not to work well because
you can overload the Internet link or the NAT, causing them to
drop packets, thus making ICE take longer to converge. Instead,
you need to space out the checks over some time. However, you &lt;em&gt;also&lt;/em&gt;
want ICE to find a viable path as soon as possible because
while ICE is running the user is just sitting there waiting—depending
on the design maybe listening to ringtone.&lt;/p&gt;
&lt;p&gt;In order to optimize the time to convergence, ICE uses a
prioritization scheme designed to provide two main properties:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;The most direct candidate pairs are checked first.&lt;/dt&gt;
&lt;dd&gt;As discussed above, you want media to traverse the most direct
path. ICE is designed so that it also &lt;em&gt;checks&lt;/em&gt; the most direct
paths first. I&#39;m actually not so sure about this design decision—in particular,
the host/host paths often will &lt;em&gt;not&lt;/em&gt; work—but it&#39;s what ICE does.&lt;/dd&gt;
&lt;dt&gt;Checks are roughly synchronized between both sides.&lt;/dt&gt;
&lt;dd&gt;Remember that in many cases, in order for Alice&#39;s checks on a given
candidate pair to succeed, Bob also needs to run a check in order to
create a binding in his NAT. If Alice checks that candidate pair first
and Bob checks that pair last, then (at best) Alice&#39;s check won&#39;t
succeed till the very end of the ICE process. At worst, by the time
Bob&#39;s check runs Alice&#39;s NAT binding will have timed out and both
checks will fail. This isn&#39;t that likely in most networks; in practice
the ICE process would just be slower than ideal.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Of course, synchronization is only loose. Let&#39;s look at the
case where both sides run checks again:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/connectivity-check-bidi.png&quot; alt=&quot;Bidirectional connectivity checks&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Recall that in this scenario Alice runs her checks, which fail
but open a binding in her NAT, allowing Bob&#39;s check to succeed.
Eventually, Alice would retransmit her checks, but this might
take some time because retransmits, like the checks themselves,
need to be paced to avoid overflowing the network. Because
it&#39;s very probable that Alice&#39;s check will work,
ICE includes an optimization called &lt;em&gt;triggered checks&lt;/em&gt;
in which an endpoint immediately (well, mostly immediately) schedules
a check in the reverse direction upon receiving a check. This allows
Alice to quickly discover that the path that is likely to work
actually does work in the common case where it is valid.&lt;/p&gt;
&lt;h3 id=&quot;multiple-media-paths%2Ffrozen&quot;&gt;Multiple Media Paths/Frozen &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#multiple-media-paths%2Ffrozen&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There&#39;s an additional complication.
When ICE was first designed it was standard
practice to use different address pairs for different streams of
media. For instance, if you had an audio and video call, you would use
different ports for them. Moreover you needed twice as many ports
because the media protocol that is in use here (&lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Real-time_Transport_Protocol&amp;amp;oldid=1155036175&quot;&gt;Real-time Transport Protocol (RTP)&lt;/a&gt;&lt;/em&gt;),
has an associated control protocol that is used for measuring packet
delivery and that also used its own ports. In other words, a simple
two person A/V call could need as many as four separate
address/port pairs, which means that you need four times
as many candidate pairs (two each for audio and video),
and hence four times as many checks. ICE&#39;s term for these
flows is &amp;quot;components&amp;quot;.&lt;/p&gt;
&lt;p&gt;This may be hard to visualize, so imagine a simplistic case
in which we only have host and server reflexive candidates and
we only want to establish two components. If we go back to our example
above, Alice would have the following candidates:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Address&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.0.0.3:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Server Reflexive&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;192.0.2.1:1234&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.0.0.3:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Server Reflexive&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;192.0.2.1:1235&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;And Bob would have:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Address&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.0.0.2:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Server Reflexive&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;198.51.100.1:5678&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.0.0.2:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Server Reflexive&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;198.51.100.1:5679&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Looking at it from Alice&#39;s perspective, she has four candidate
pairs to check (recall that Alice doesn&#39;t need to pair her srlfx candidates
with Bob&#39;s candidates).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Local&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Remote&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.2:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host ↔ Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;198.51.100.1:5678&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host ↔ Srflx&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.2:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host ↔ Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;198.51.100.1:5679&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host ↔ Srflx&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In order to optimize these checks, ICE takes advantage of the
observation that NAT behavior is likely to be consistent, so
if a set of candidates works for the audio component then a
set of similar candidates (though of course with different
addresses) is likely to work for the video component. In order
to exploit this, ICE initially only checks one set of candidate
pairs for each type and sets the others as &lt;em&gt;frozen&lt;/em&gt;. If the
first candidate pair succeeds, then ICE unfreezes the others.
This avoids doing redundant checks in parallel.
In this case, at the start of ICE, we would have a situation like this:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Local&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Remote&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Usage&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;State&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&quot;row-blue&quot;&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.2:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host &amp;harr; Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Checking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&quot;row-blue&quot;&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;198.51.100.1:5678&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host &amp;harr; Srflx&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Checking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&quot;row-red&quot;&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.2:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host &amp;harr; Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Frozen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&quot;row-red&quot;&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;198.51.100.1:5679&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host &amp;harr; Srflx&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Frozen&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;ICE would first check the pairs listed as &amp;quot;checking&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
Then if the audio host ↔ host candidate pair works, ICE would
unfreeze the corresponding video candidate pair.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Local&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Remote&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Usage&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;State&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&quot;row-green&quot;&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.2:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host &amp;harr; Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Succeeded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&quot;row-blue&quot;&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;198.51.100.1:5678&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host &amp;harr; Srflx&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Audio&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Checking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&quot;row-blue&quot;&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.2:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host &amp;harr; Host&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Checking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&quot;row-red&quot;&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1112&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;198.51.100.1:5679&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Host &amp;harr; Srflx&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Frozen&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The result of this
is that once you determine that a given type of candidate pair works,
you start checking the rest of the pairs of that type; as with triggered
checks the idea here is to converge to a working set of candidate
pairs as fast as possible.&lt;/p&gt;
&lt;p&gt;As I said above, I&#39;m simplifying a bunch and there&#39;s more to
candidates being &amp;quot;similar&amp;quot; than just the types of the candidates. For
instance, if I have both wired and WiFi network interfaces, each of
those would have a candidate. If the wired candidate pairs succeed, I
would just unfreeze those but not the wireless pairs. The way this is
captured in ICE is by assigning each candidate a &amp;quot;foundation&amp;quot; that
characterizes the candidate (based on IP address, type, etc.). The
foundation of a candidate pair is the pair of local and remote
foundations.&lt;/p&gt;
&lt;p&gt;This is clearly not a great situation but, remember we&#39;re not
building from scratch. VoIP systems are built out of
technologies designed back in the 1990s when
people had different ideas about how to design networking protocols
(and in particular when NATs and firewalls were less ubiquitous).
Eventually, the IETF worked out how to multiplex multiple
flows on the same address/port quartet using a pair of
technologies called &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc5761&quot;&gt;RTCP-mux&lt;/a&gt;
and &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8843&quot;&gt;BUNDLE&lt;/a&gt;).
This actually represents years of engineering work
to retrofit the protocol &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8843&quot;&gt;mechanisms&lt;/a&gt;
without causing &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#backwards-compatibility&quot;&gt;backwards compatibility&lt;/a&gt; issues, but
fortunately it mostly
works now, so if you&#39;re on a modern system you&#39;re back to only needing a lot of checks
rather an absurd number.&lt;/p&gt;
&lt;h2 id=&quot;selecting-pairs&quot;&gt;Selecting Pairs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#selecting-pairs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;OK, we&#39;re almost to the end now. Alice and Bob are running checks,
some of which succeed and some of which fail. As noted above, it&#39;s
quite common for more than one candidate pair to succeed for
each path because the host ↔ srflx candidate pair will often
work and one of the relayed candidate pairs will almost always
work. This means you have multiple paths that might work, so now
what?&lt;/p&gt;
&lt;p&gt;You could just have each side independently pick its favorite
candidate pair and send on it, but this turns out to be bad
idea. Remember that many NATs time out their bindings after
a short period (10-30 seconds) of inactivity and that it&#39;s
&lt;em&gt;outgoing&lt;/em&gt; packets that keep the binding alive. If Alice and
Bob use different paths, then Alice may not be sending
the packets that keep the binding open for Bob&#39;s incoming packets.
If Alice and Bob use the same candidate pair, then the path
will be symmetrical and the binding will stay alive.
This means we need some mechanism for picking which pair
the endpoints will use.&lt;/p&gt;
&lt;p&gt;In modern ICE, this works by having one endpoint (the &amp;quot;controlling&amp;quot;)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
side
pick which pair to use. The controlling endpoint runs checks for each
component until one succeeds that it wants to use (the actual logic
here is unspecified, but typically you&#39;d do something like wait until
one of the direct pairs worked or they had all failed and one of the
relayed pairs had succeeded) and then it sends another check on the
same pair with the USE-CANDIDATE flag (this is called
&amp;quot;nominating&amp;quot; the pair).
When the (controlled) peer
sees that flag it knows to use that candidate pair going forward.
When the controlling side&#39;s check succeeds—which should always
happen if the pair is already successful—then it knows it
is safe to use the pair as well and from here forward both sides
will just use that pair.&lt;/p&gt;
&lt;p&gt;Of course, it might take some time for the controlling endpoint
to run enough checks to feel comfortable picking one, and you
want to have media start flowing right away.
To accommodate this, ICE allows endpoints to start sending media as soon as
they have a valid pair, even before one has been nominated.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
This shortens setup latency while allowing time for the controlling
endpoint to nominate the optimal pair. Usually this will happen
quickly enough that you don&#39;t need to worry about the bindings
timing out. It does mean, however, that the path the media takes
may change as the ICE checking process proceeds.&lt;/p&gt;
&lt;h2 id=&quot;trickle-ice&quot;&gt;Trickle ICE &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#trickle-ice&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Classic ICE is a sequential process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Gather all your candidates and send them to the other side.&lt;/li&gt;
&lt;li&gt;Receive the other side&#39;s candidates&lt;/li&gt;
&lt;li&gt;Run checks&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This all works fine if candidate gathering is fast, but what if it&#39;s
not? For instance suppose you are behind a firewall which blocks UDP
and you have to use TCP to connect to the relay server? If the
firewall just drops the packets without sending you errors, you&#39;re
waiting for the candidate gathering process to time out.  This might
take several seconds (potentially more, depending on your timers) to
discover. In the meantime, people are just waiting, which isn&#39;t
ideal.&lt;/p&gt;
&lt;p&gt;To deal with this, the Google Hangouts team invented a technique
called &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8838.html&quot;&gt;trickle ICE&lt;/a&gt;,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
in which each side sends candidates as soon as it has them,
so that they &amp;quot;trickle&amp;quot; in over time. This creates some additional
complexity because you have to incrementally pair new local or
remote candidates, but has the potential to significantly decrease
the time to connection establishment. This is especially useful
in the context of WebRTC, when the Web site doesn&#39;t necessarily
know in advance which of the various STUN or TURN servers it is
offering will actually be reachable by the client.&lt;/p&gt;
&lt;h2 id=&quot;backwards-compatibility&quot;&gt;Backwards Compatibility &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#backwards-compatibility&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As described above, ICE has been through a number of iterations,
and so it&#39;s possible that a modern endpoint
will end up talking to an older endpoint. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An endpoint that supports RFC 8445 ICE might need to talk
to an endpoint that supports RFC 5245 ICE.&lt;/li&gt;
&lt;li&gt;An endpoint that supports trickle ICE might talk to a non-trickle
endpoint.&lt;/li&gt;
&lt;li&gt;An endpoint that supports component multiplexing (BUNDLE)
might talk to one that does not.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the classic SIP softphone setting, there&#39;s no real way to
know what the peer supports, so you need to send ICE information
that is compatible with the other endpoint.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
For instance,
if you support trickle but you don&#39;t know what the other side
supports, then you need to gather all the candidates you will
need anyway, but you can say in your message that you support
trickle, and so the other side can use it (this is called &amp;quot;half trickle&amp;quot;).&lt;/p&gt;
&lt;p&gt;Similarly, if you support component multiplexing, but you
don&#39;t know if the other side does, then you may need to gather
candidates for all the components, even if the other side is
going to throw most of them away. This can get quite expensive,
however, and the default for WebRTC is what&#39;s called
&amp;quot;balanced&amp;quot; mode, in which you gather candidates &lt;em&gt;only&lt;/em&gt;
for the first stream of each type (e.g., the first audio
channel). If the other peer supports bundling components,
then this works fine, and if it doesn&#39;t, then only the first
stream connects. Of course, actually designing something
that fell back gracefully in this situation instead of
just freaking out because there were no candidates available
for the later components took some doing.&lt;/p&gt;
&lt;p&gt;The situation is a bit better if you know you are doing a
call that is WebRTC on both ends—e.g., because both
ends are browsers or one end is a modern conference server—for
two reasons. First, the WebRTC specifications (specifically &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8829.html&quot;&gt;JSEP&lt;/a&gt;)
require support for multiplexing (both BUNDLE and RTP/RTCP)
and for trickle ICE, so you know you have a modern endpoint
on the other side.
Second, the server can use JS APIs to determine the capabilities
of each endpoint, so it has a better chance of getting an interoperable
configuration.&lt;/p&gt;
&lt;h2 id=&quot;security&quot;&gt;Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#security&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The threat model for ICE is confusing for a number of reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A full network attacker will generally be able to manipulate
packets (e.g., drop them, send them with a bogus IP, etc.)
and so you have limited protection against such an attacker.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you use media encryption between the endpoints—as was uncommon
back in 2010 when ICE was first designed but is mandatory
in WebRTC—then even an attacker who sees all the packets
has limited abilities.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the WebRTC case, the Web site actually invoking the
APIs may be an attacker, though they probably do not control
the network.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In general, then, we have three main objectives:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;That an attacker who &lt;em&gt;can&#39;t&lt;/em&gt; see your packets can&#39;t interfere
with connection formation or reroute traffic to themselves.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;That an attacker who can see packets can&#39;t just forge arbitrary
content (more on this below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;That a non-network attacker driving the WebRTC API
(or a SIP peer, though this is a weaker attacker)
can&#39;t force you to connect to someone besides themselves
by providing their address in a candidate.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most of these attacks are prevented by two security mechanisms found
in STUN:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Each STUN message is cryptographically protected (via an authentication
tag that prevents tampering with the message) with a username
and password exchanged along with the ICE parameters.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Each STUN check has a unique 96-bit transaction identifier which
must be echoed in the response.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These two mechanisms work together.&lt;/p&gt;
&lt;p&gt;Because the credentials are not known to network attackers, they are
unable to forge requests or responses. This is not a complete defense
because—as noted above—a full network attacker can take
a valid packet and send it from a fake IP address, thus causing the
receiver to think it came from somewhere else (as in a peer reflexive
address) but they can&#39;t tamper with the contents, but it prevents
a number of attacks. The username mechanism also prevents cases of ambiguity in which
a STUN check arrives at another endpoint which just happens
to be doing STUN. Because the username will be different, it
will not respond to the check.&lt;/p&gt;
&lt;p&gt;However, the username and password mechanism does not prevent attacks
by a Web site using WebRTC, because that site knows the username and
password. However, because the transaction ID is unpredictable—and
importantly, not revealed to the site&#39;s JavaScript—it can&#39;t
forge a response to any check it doesn&#39;t receive. Thus, ICE establishes
that the receiver of the traffic has consented to receive it.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-3/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s important to remember how we got here. In &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1&quot;&gt;part I&lt;/a&gt; I wrote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;NATs provide a particularly good example of the way the Internet
evolves, which is to say workaround upon workaround. The reason for
this is what Google engineer Adam Langley calls the &amp;quot;Iron law of the
Internet&amp;quot;, namely that the last person to touch anything gets blamed.
The people who first built and deployed NATs had to avoid
breaking existing deployed stuff, forcing them to build hacks
like ALGs and unpredictable idle timeouts.
Now that NATs are widely deployed, new protocols
have to work in that environment, which forces them to run over
UDP and to conform to the outgoing-only flow dynamics dictated
by the NAT translation algorithms.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As we can see with ICE, it&#39;s not just a matter of working
with existing NATs but of working with all the previously
deployed systems that were deployed before ICE was
available, as well as working with previous versions of
ICE. The result is a system of extreme complexity which
almost nobody really understands, which has to run
before even the first byte of media is delivered. And yet,
it mostly works, as you can see for yourself if you use
any WebRTC-based calling system such as Meet or Teams.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The astute reader may have noticed that in
the &amp;quot;different network&amp;quot; scenario, Alice and
Bob&#39;s server reflexive addresses have different
IPs whereas in the &amp;quot;same network&amp;quot; scenario they
have the same IP address. You might think you could
compare the addresses to determine which situation you
were in. Unfortunately, this
isn&#39;t dispositive because Alice and Bob might
be behind a carrier grade NAT which had a pool of multiple
IP addresses that it assigned from. Because of the
way that IP addresses are assigned, it&#39;s not generally
possible to determine whether two addresses belong
to the same network. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
When all you have is a hammer, everything looks like a nail. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: Bob doesn&#39;t retransmit his ACKs;
he just responds to Alice&#39;s retransmissions. This is
a pretty typical reliability design because otherwise
you end up worrying about whether ACKs were delivered
and having ACKs of ACKs, which is a mess. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is of course not always true, but it&#39;s a good rule of thumb. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;m simplifying the algorithm here as they would actually
start in Waiting and then move to In-Progress. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Don&#39;t make me explain how we decide which is which.
 &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In the original version of ICE, there was instead something
called &amp;quot;aggressive mode&amp;quot; in which the controlling endpoint
would send USE-CANDIDATE on multiple pairs and the controlled
endpoint would pick the highest priority one, but that
was removed in favor of this rule. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This idea, documented in &lt;a href=&quot;https://xmpp.org/extensions/xep-0176.html#protocol-candidates&quot;&gt;XEP-0176&lt;/a&gt;
appears to be originally due to Joe Beda. Thanks to &lt;a href=&quot;https://www.linkedin.com/in/juberti/&quot;&gt;Justin Uberti&lt;/a&gt;
for helping me track this down. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You might even be talking to an endpoint that doesn&#39;t
support ICE, but for all the reasons we&#39;ve discussed
here, that&#39;s basically not going to work. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-3/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Defending against Bluetooth tracker abuse: it’s complicated</title>
		<link href="https://educatedguesswork.org/posts/unwanted-tracking/"/>
		<updated>2023-05-08T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/unwanted-tracking/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;Bluetooth-based tracking tags like
&lt;a href=&quot;https://www.apple.com/airtag/&quot;&gt;AirTags&lt;/a&gt; and
&lt;a href=&quot;https://www.tile.com/&quot;&gt;Tiles&lt;/a&gt; are fantastically useful for
finding lost stuff like your keys, your bike, or &lt;a href=&quot;https://www.amazon.com/Airtag-Collar-Reflective-Waterproof-Compatible/dp/B09QQ2X3S6&quot;&gt;your cat&lt;/a&gt;. Unfortunately, they are a dual use
technology which is also easy to use for surreptitiously tracking other
people. This isn&#39;t a complicated attack to mount: you
get a tracking tag and pair it with your own phone, plant
it on your victim, and then use the find my stuff feature to
monitor their location. This unpleasant fact isn&#39;t news:
there have been concerns about misuse of these technologies
for years, especially after the release of AirTags
(see my &lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/&quot;&gt;earlier post&lt;/a&gt; for some
initial thoughts).&lt;/p&gt;
&lt;p&gt;On Tuesday Google and Apple
&lt;a href=&quot;https://www.apple.com/newsroom/2023/05/apple-google-partner-on-an-industry-specification-to-address-unwanted-tracking/&quot;&gt;published&lt;/a&gt;
a &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-detecting-unwanted-location-trackers/&quot;&gt;set of
guidelines&lt;/a&gt;
for how trackers should behave to reduce the risk of unwanted
tracking. This post takes a look at that document and the bigger
problem space.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-bluetooth-trackers&quot;&gt;Background: Bluetooth Trackers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#background%3A-bluetooth-trackers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Because these tracking systems are non-interoperable, they don&#39;t
necessarily all work the same way. However, Apple provides &lt;a href=&quot;https://support.apple.com/en-gb/guide/security/sec6cbc80fd0/1/web/1#:~:text=End-to-end%20encryption&quot;&gt;some
detail&lt;/a&gt;
about how the system works, and back in 2021 Heinrich, Stute,
Kornhuber, and Hollick reverse engineered the system and published a
&lt;a href=&quot;https://www.petsymposium.org/2021/files/papers/issue3/popets-2021-0045.pdf&quot;&gt;paper&lt;/a&gt;
in PoPETS describing how it works as well as some vulnerabilities.&lt;/p&gt;
&lt;p&gt;The obvious design for this kind of system would be to just have each
tag have a single fixed identifier which it broadcast periodically
over &lt;em&gt;Bluetooth Low Energy (BLE)&lt;/em&gt;.
As a practical matter, the tag doesn&#39;t actually broadcast unless
it&#39;s out of range of one of the devices its owner has paired it
with; if it&#39;s in range, then the owner device can find it
directly.
Whenever the tag was within range of a participating device (e.g.,
a phone), that phone would then upload the device tag and its
own position to some central server. When you lost your device,
you would then contact that server and request its last known
location, as shown in the diagram below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/TrackingTag1.png&quot; alt=&quot;A simple tracking tag system&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This system has some obvious security and privacy issues:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The service can track the position of any tag (and in fact all
tags) just by looking at the database.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In fact, &lt;em&gt;anyone&lt;/em&gt; can track a tag if they know the identifier,
so if you see it once, you can just query the database.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Even without access to the database, an attacker can reidentify
a given device. For instance, if you had a receiver at the entrance
to a store, you could see when the same person came by again
(this is a similar set of issues to those with &lt;a href=&quot;https://educatedguesswork.org/img/license-plates&quot;&gt;license plates&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The second and third attacks can be addressed by just having a rotating
identifier. I.e., each tag $i$ has a secret value $SK_i$ which it
shares with its owner at the time of pairing with the device.
Instead of broadcasting $SK_i$ directly, it uses it as the seed
for a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Pseudorandom_function_family&amp;amp;oldid=1144796045&quot;&gt;pseudorandom function (PRF)&lt;/a&gt;
to create a &lt;em&gt;rotating
identifier&lt;/em&gt; $ID_{i,t}$ where $t$ is the current time and broadcasts
that instead. Each identifier will be used for a fixed time
(say 15 minutes) and then the tag generates a new identifier and
broadcasts that. The device owner knows $SK_i$ and can use it to
generate $ID_{i,t}$  so it can still query the central service just
by asking for the IDs for recent times, but someone who just observes
a single ID can&#39;t query the service for the locations of other IDs for
the same tag (and of course they already know the location at the time of observation).&lt;/p&gt;
&lt;h3 id=&quot;rotating-ids&quot;&gt;Rotating IDs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#rotating-ids&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This also &lt;em&gt;partly&lt;/em&gt; solves the problem of the service tracking the tag,
because it also cannot link up multiple identifiers, so all it has is
a set of locations. However, if there are
comparatively few tags then the service can infer people&#39;s behavior
just by looking at the unlinked locations. E.g, if I see two IDs on
Highway 101 traveling in opposite directions (inferred from the lane
they are in) and them some other ID getting off on a Southbound exit,
I can infer that there was a single device that was going South and
then exited, but it&#39;s less information.  In addition, when someone
queries for the location of their tag, then the service provider gets
the IDs for a range of time periods, which it knows all correspond to
the same device, and can then link up the motion of the tag during
that time range.&lt;/p&gt;
&lt;p&gt;Apple&#39;s design (the best documented) addresses this by having the locations where the tags are detected
encrypted to the device owner. This works similarly to the rotating
ID system except that instead of generating a rotating ID, the tag
generates a rotating private/public key pair: $(Priv_{i,t}, Pub_{i,t})$.
The tag broadcast $Pub_{i,t}$ just as it would the ID, but then when
a device sees the broadcast, it uploads the location &lt;em&gt;encrypted&lt;/em&gt; under
that public key. When the device owner wants to find the tag, it
queries the server using the public key&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; (just as it would have before
with the tag) and gets the encrypted value. Because it shared $SK_i$ with
the tag, it can generate $Priv_{i,t}$ and can decrypt the encrypted
location, as shown in the figure below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/TrackingTagEncrypted.png&quot; alt=&quot;An encrypted tracking tag system&quot; /&gt;&lt;/p&gt;
&lt;h3 id=&quot;privacy-properties&quot;&gt;Privacy Properties &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#privacy-properties&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This system has significantly improved privacy properties.
As with a simple rotating identifier, an attacker can&#39;t track
a tag using multiple observations over an extended period.
And because the reports are encrypted, the service provider
is not able to &lt;em&gt;directly&lt;/em&gt; determine the actual location of the device.
However, that doesn&#39;t mean that the service provider doesn&#39;t
learn anything. In particular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If two owners both query the location of lost tags which
are reported by the same device, than it allows the service
to infer that the owners were at one point in the same
location (this attack is reported in the Heinrich et al. paper).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If two devices both report the location of the same tag then
the provider can infer that those devices were in the same
location at the time of the report.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the service provider has an independent way of learning
the location of a reporting device—for instance
by IP location or because the owner uses some location-based
service—and then the owner
queries for its location, the service gets to learn
information about the owner&#39;s movements (because that is
where they probably lost the tag). This attack is exacerbated
by the fact that you want to query multiple keys (one for
each time range), so the service might learn multiple
locations for the same tag and be able to link them.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The root cause of all of these issues&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
is that the service gets to learn the identity of reporting
devices when they make reports, as well as potentially of
the device owner when they query for location. This part
of Apple&#39;s design isn&#39;t very clearly documented, but
presumably the rationale for identifying the endpoints is
to prevent abuse (e.g., forged location reports) by
requiring that they be genuine Apple devices (see
Section 9.4 of Heinrich et al.). It should
be possible to address this issue using standard
anonymity techniques such as Oblivious HTTP,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
though it doesn&#39;t appear Apple does that.&lt;/p&gt;
&lt;h2 id=&quot;unwanted-tracking&quot;&gt;Unwanted Tracking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#unwanted-tracking&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The privacy
mechanisms described above are about preventing &lt;em&gt;other people&lt;/em&gt; from
learning the location of &lt;em&gt;your&lt;/em&gt; tags, but the way you use a system
like this to track someone else is to attach one of your tags to
something of theirs and then query the system to see where your tag
is. This is a much harder problem to solve because the whole point of
the system is that the tag isn&#39;t attached to you (that&#39;s why you&#39;re
looking for it!) and there&#39;s no real technical way to distinguish the
case where I accidentally left my keys in your car from the one where
I maliciously stuck an AirTag to your car to track you.&lt;/p&gt;
&lt;p&gt;Instead, the countermeasures that Apple and others have designed
seem to center around making this situation &lt;em&gt;detectable&lt;/em&gt;. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If AirTags are away from their owners for &amp;quot;an extended period of time&amp;quot; they
make a sound when moved.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your iOS device detects that an AirTag that doesn&#39;t belong to you
moving with you, it will notify you on the device and then you can
try to find it and figure out what&#39;s going on.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once you have detected a tag that appears to be following you, AirTags
also include a feature that lets you partially identify the owner of
the tag, as long as you can physically access the tag.&lt;/p&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/about-airtag.png&quot; width=&quot;300&quot; alt=&quot;Airtag about info&quot; /&gt;
&lt;p&gt;[Source: &lt;a href=&quot;https://support.apple.com/en-us/HT212227&quot;&gt;Apple&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;My personal experience is that these features are both fairly hit and
miss. In terms of the sound notification, the speaker in AirTags is
pretty quiet and the noise is kind of intermittent. We use AirTags
to keep track of our cats, but it&#39;s paired to my wife&#39;s phone not
mine. After she had been out of town for several days, I finally
noticed the AirTags making sound and took them off the cat&#39;s collars,
but the first time this happened I probably heard the sound about
three or four times—and who knows how many times I didn&#39;t hear
it—before I figured out what it was. We&#39;re all constantly surrounded
by stuff beeping so it&#39;s easy to get habituated to it.&lt;/p&gt;
&lt;p&gt;Similarly, I&#39;ve had the &amp;quot;someone is moving with you&amp;quot; trigger a number
of times—most recently Saturday—such as when someone accidentally left their AirPods around,
but that also takes a while to trigger and is easy to ignore. I
imagine both of these features would work a lot better if you were
really worried about being tracked, but at least in my experience
there are a lot of false positives, which makes the whole system less
useful than one might like.&lt;/p&gt;
&lt;h2 id=&quot;the-apple%2Fgoogle-draft&quot;&gt;The Apple/Google Draft &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#the-apple%2Fgoogle-draft&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;On Tuesday, Apple and Google
&lt;a href=&quot;https://www.apple.com/newsroom/2023/05/apple-google-partner-on-an-industry-specification-to-address-unwanted-tracking/&quot;&gt;published&lt;/a&gt;
a &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-detecting-unwanted-location-trackers/&quot;&gt;document&lt;/a&gt; describing
guidelines for how trackers ought to behave in order to make unwanted tracking
easier to detect.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Today Apple and Google jointly submitted a proposed industry
specification to help combat the misuse of Bluetooth
location-tracking devices for unwanted tracking. The
first-of-its-kind specification will allow Bluetooth
location-tracking devices to be compatible with unauthorized
tracking detection and alerts across iOS and Android
platforms. Samsung, Tile, Chipolo, eufy Security, and Pebblebee have
expressed support for the draft specification, which offers best
practices and instructions for manufacturers, should they choose to
build these capabilities into their products.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Mostly this document provides detailed specifications of the behaviors
I&#39;ve described informally above. For instance, here&#39;s the portion
describing how the audible alerts should work:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   After T_(SEPARATED_UT_TIMEOUT) in separated state, the accessory MUST
   enable the motion detector to detect any motion within
   T_(SEPARATED_UT_SAMPLING_RATE1).

   If motion is not detected within the T_(SEPARATED_UT_SAMPLING_RATE1)
   period, the accessory MUST stay in this state until it exits
   separated state.

   If motion is detected within the T_(SEPARATED_UT_SAMPLING_RATE1) the
   accessory MUST play a sound.  After first motion is detected, the
   movement detection period is decreased to
   T_(SEPARATED_UT_SAMPLING_RATE2).  The accessory MUST continue to play
   a sound for every detected motion.  The accessory SHALL disable the
   motion detector for T_(SEPARATED_UT_BACKOFF) under either of the
   following conditions:

   *  Motion has been detected for 20 seconds at
      T_(SEPARATED_UT_SAMPLING_RATE2) periods.

   *  Ten sounds are played.

   If the accessory is still in separated state at the end of
   T_(SEPARATED_UT_BACKOFF), the UT behavior MUST restart.
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;not-a-full-specification&quot;&gt;Not a full specification &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#not-a-full-specification&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;What this document is not, however, is a complete specification
of a tracking system. In particular, it doesn&#39;t cover any of
the fancy (well fancy-ish) cryptography I described above. Instead,
it describes a Bluetooth container for the messages, with the following
contents:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Bytes&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Description&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Requirement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0-5&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;MAC address&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;REQUIRED&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6-8&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Flags TLV; length = 1 byte, type = 1 byte, value = 1 byte&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;OPTIONAL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;9-12&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Service data TLV; length = 1 byte, type = 1 byte, value = 2 bytes (TBD value)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;REQUIRED&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;13&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Protocol ID (TBD value)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;REQUIRED&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;14&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Near-owner bit (1 bit) + reserved (7 bits)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;REQUIRED&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;15-36&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Proprietary company payload data&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;OPTIONAL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As far as I can tell, the cryptographic pieces would
go in the &amp;quot;proprietary company payload data&amp;quot; portion, though
it&#39;s actually not clear to me precisely how this works in
the case of AirTags. As Heinrich et al. describe, the
BLE payload is quite small (31 bytes for the &lt;code&gt;ADV_NONCONN_ID&lt;/code&gt; PDU) but the BlueTooth
standard requires a 4-byte header for manufacturer-specific
data, so Apple had to do do some tricky
engineering to get the P-224 public key (28 bytes) into
the remaining 27 bytes of the packt (they repurpose part of
the MAC address to do this).
It&#39;s not quite clear to me how Apple plans to stuff the
public key into the 21 &amp;quot;proprietary payload&amp;quot; bytes, but
presumably they have some plan in mind. Any readers who
know how this is supposed to work should &lt;a href=&quot;mailto:ekr@rtfm.com&quot;&gt;reach out&lt;/a&gt;.
Maybe they plan to send two packets?&lt;/p&gt;
&lt;p&gt;The key point here is that this isn&#39;t enough of a specification
to provide interoperability between systems. For instance, it
wouldn&#39;t tell you enough to build your own tags which worked
with Apple&#39;s tracking network;
it&#39;s just supposed to be enough to tell you how to build your
tracking tags so that they are detectable. Note the careful
phrasing here: the document doesn&#39;t tell you &lt;em&gt;how to detect tracking tags&lt;/em&gt;,
it just tells you how to build tags which are trackable
and you are left to infer how to detect them.&lt;/p&gt;
&lt;h3 id=&quot;detecting-tracking-tags&quot;&gt;Detecting Tracking Tags &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#detecting-tracking-tags&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;With that said, this document does help explain something confusing about the
description I provided above, namely how devices are to detect
that a tag is following them if the identifier it broadcasts
changes every 15 minutes. The answer appears to be that the
BLE address &lt;em&gt;doesn&#39;t change&lt;/em&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;An accessory SHALL rotate its resolvable and private address on any
transition from near-owner state to separated state as well as any
transition from separated state to near-owner state.&lt;/p&gt;
&lt;p&gt;When in near-owner state, the accessory SHALL rotate its resolvable
and private address every 15 minutes.  This is a privacy
consideration to deter tracking of the accessory by non-owners when
it is in physical proximity to the owner.&lt;/p&gt;
&lt;p&gt;When in a separated state, the accessory SHALL rotate its resolvable
and private address every 24 hours.  This duration allows a
platform&#39;s unwanted tracking algorithms to detect that the same
accessory is in proximity for some period of time, when the owner is
not in physical proximity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &amp;quot;resolvable&amp;quot; address refers to the BLE network
address (&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Medium_access_control&amp;amp;oldid=1100996685&quot;&gt;MAC address&lt;/a&gt;).
In other words, when in the separated state, the tag sends
out beacon packets where the MAC address is constant for 24 hours
&lt;em&gt;even if the public key rotates every 15 minutes&lt;/em&gt; (and remember
that the public key encryption piece isn&#39;t specified here).
So presumably what you are supposed to do as a device
is look for any tag (identified by MAC address) that has been
following you for a while and if so alert the user. But how
long a period is &amp;quot;a while&amp;quot;. Who knows? That&#39;s up to you.&lt;/p&gt;
&lt;p&gt;Why not just rotate the address every 24 hours all the time? Two
reasons: (1) it prevents triggering the detection algorithm
as long as it has a trigger at more than 15 minutes and (2) it
make the tag less trackable in cases where it is traveling with its owner
(see &lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#rotating-ids&quot;&gt;rotating IDs&lt;/a&gt; above. There is also a &amp;quot;near-owner&amp;quot;
bit in the advertisement that says that the tag is near its owner
and that detecting devices shouldn&#39;t treat it as
tracking them.&lt;/p&gt;
&lt;p&gt;Once a tag is detected, it is also possible to connect to it
directly and query its information (manufacturer, product
type, etc.), as well as to cause it to play a sound.
It is also possible to retrieve the device serial number
as long as you can demonstrate close proximity, either via
an NFC connection or some user action on the device itself
(pressing a button, etc.)&lt;/p&gt;
&lt;h2 id=&quot;the-broader-threat-model&quot;&gt;The Broader Threat Model &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#the-broader-threat-model&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;My bigger concern is that this document seems be limited to a fairly
narrow threat model, which is to say tracking by naive attackers
who take an off-the-shelf tag and attach it to their victim.
The Apple/Google document describes a set of behaviors that companies
ought to build into their trackers to mitigate this threat,
but unfortunately, this isn&#39;t the only threat.&lt;/p&gt;
&lt;p&gt;It&#39;s already possible to buy relatively compact GPS
trackers that don&#39;t depend on using Bluetooth to talk to other devices
(see this &lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-staling&quot;&gt;older post&lt;/a&gt; for more on
this topic.). However, these trackers are expensive (about $300, plus
a subscription), have
battery lifetimes measured in days or (at best weeks), and are several
centimeters across, so are somewhat hard to conceal. By contrast, tracking tags like Tiles or AirTags have a combination
of features that makes them more attractive for surveillance.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;They are compact (thus easy to hide)&lt;/li&gt;
&lt;li&gt;They are cheap (thus easy to obtain)&lt;/li&gt;
&lt;li&gt;They have long battery lifetimes (and thus are suitable
for long-term surveillance)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These features are made possible by the existence of a widespread
network of devices (phones, etc.) which can report the position of a
lost tag. That network allows the use of much cheaper and energy
efficient technologies than a tracker like the Garmin inReach, which
needs both a GPS receiver and a satellite transmitter. It&#39;s that
network that creates the risk, not the tracking tags themselves.
Specifically, if the attacker can obtain a tag which can successfully
be located with the tracking network but which doesn&#39;t conform to the
behaviors specified in this document, then the detection mechanisms
that this document anticipates will be less effective if not
completely useless.&lt;/p&gt;
&lt;p&gt;There are at least two possible ways for an attacker to obtain such
a tag:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Modifying an existing tag.&lt;/dt&gt;
&lt;dd&gt;The stock tags made by each manufacturer are cheap and generally reasonably
well-engineered, so it&#39;s convenient for the attacker if they can just
buy them and disable the anti-tracking features.
For example, in his thorough AirTag &lt;a href=&quot;https://adamcatley.com/AirTag.html&quot;&gt;teardown&lt;/a&gt;,
Adam Catley observes that it&#39;s possible to disable the speaker in an AirTag
and suggests that the tag be modified to check to see if the speaker is actually
making noise. Depending on the design of the tag, it might be possible to rewrite
the firmware to violate the requirements in this document, for instance
by rotating the MAC address frequently to evade detection (oddly: this
document says &amp;quot;The accessory SHOULD have firmware that is updatable by the owner&amp;quot;,
which is the opposite of what you want here.)&lt;/dd&gt;
&lt;dt&gt;Building an entirely new tag.&lt;/dt&gt;
&lt;dd&gt;Even if the stock tags are hard to modify, once it&#39;s public information how
these devices are built it&#39;s possible to make your own tags that don&#39;t have any
anti-tracking features at all. In fact, this already exists in
the form of &lt;a href=&quot;https://github.com/seemoo-lab/openhaystack&quot;&gt;OpenHaystack&lt;/a&gt; built
by the same team as that published the PoPETS paper I&#39;ve been relying on
for most of this analysis. OpenHaystack is designed to run on commodity
hobby hardware like the &lt;a href=&quot;https://microbit.org/&quot;&gt;BBC micro:bit&lt;/a&gt; which is
quite a bit bigger than an AirTag but obviously it would be possible for
someone to engineer something compact and cheap, perhaps using the
AirTag design as a starting point. Note that it doesn&#39;t really help
that the specific design of any individual system is secret: there are
tens of millions of these devices out there, and it just takes
one person to reverse engineer a tag and publish the results.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Either of these attacks requires more sophistication than just buying
an AirTag through Amazon, but the would-be stalker doesn&#39;t have to
have that sophistication themselves; they just need some third
party to start making and selling tags that are suitable for surveillance.
If such devices become widely available, then the countermeasures
Apple and Google are proposing will become much less effective.
There&#39;s already a market for &amp;quot;&lt;a href=&quot;https://consumer.ftc.gov/articles/stalking-apps-what-know&quot;&gt;stalking apps&lt;/a&gt;,&amp;quot;
so this seems like a real risk.&lt;/p&gt;
&lt;p&gt;What you really want here is for it not to be possible to make
a tag which participates in the tracking network without implementing
the specified anti-tracking behaviors. This is a hard job under
any circumstances (though see some handwaving ideas &lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#attestation&quot;&gt;below&lt;/a&gt;), but
is made much harder by specifying a design in which
tracking detection pieces are specified at one level (the BLE layer)
and the official &amp;quot;find my device&amp;quot; functionality is implemented
in a proprietary layer that sits on top of that. That makes
it very easy for an attacker to build their own tag that
complies with the (reverse engineered) proprietary pieces but
then violates the rules at the BLE layer. I can understand why
Apple and Google, who each presumably have some proprietary design,
want to avoid standardizing that piece, but the result is that
the problem of detecting unwanted tracking is much harder.&lt;/p&gt;
&lt;h3 id=&quot;attestation&quot;&gt;Attestation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#attestation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The most straightforward approach
is if we assume that &amp;quot;official&amp;quot; devices behave correctly
and then have some mechanism for detecting official devices.
The standard approach here is to have what&#39;s called an &amp;quot;attestation&amp;quot;
mechanism in which each legitimate device has some secret embedded
by the manufacturer which can be used to prove that it&#39;s legitimate
(e.g., by signing something).
See ([here](/posts/verifying-software for more on this.) Devices would then require tags
to prove they were legitimate before reporting their location
to the network. Of course, this secret has to be embedded in tamper-resistant
hardware to prevent an attacker stealing the secret and making
their own fake devices.&lt;/p&gt;
&lt;p&gt;Actually building a system like this in such a way that the attestation
doesn&#39;t itself become a tracking vector (e.g., by having each
device have a single attestation key which can then be tracked)
is challenging cryptographically (this is also an issue with
the &lt;a href=&quot;https://www.w3.org/TR/webauthn-2/#sctn-attestation-types&quot;&gt;WebAuthn&lt;/a&gt;
public key authentication system), but there are some approaches
that sort of work, or at least are somewhat better than the naive
design.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;However, even if you know for sure that you are talking to a legitimate
device, that doesn&#39;t necessarily tell you that it&#39;s acting as its
supposed to. As a simple example, you might have a device
which sent the right BLE data but whose speaker had been disabled
(or which was wrapped in sound-absorbing material). A fancier
attacker might take a legitimate tag and &lt;em&gt;proxy&lt;/em&gt; its signals
to the device by putting it in a radio-absorbing case and then
receiving and retransmitting whatever signals it sent, as shown
below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tracker-proxy.png&quot; alt=&quot;An attacker rewriting the proxy signals&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this example, the tag is in the separated state, so it is
supposed to keep a constant MAC address (though presumably
still rotate its public key). However, the attacker captures
this message and rewrites the MAC address so it looks like it
a different device, fooling the detection algorithm.&lt;/p&gt;
&lt;p&gt;This kind of cut-and-paste attack is possible to address by having
the proprietary pieces that the network relies on enforce
the correctness of the anti-tracking pieces (e.g., by signing
the expected MAC address), but in order for this to work,
they need to be aware of each other, which, as I said, isn&#39;t specified
anywhere in this document.
The point here is that successfully
designing anti-tracking mechanisms requires analyzing the system
as a whole, not just looking at one piece at a time. In particular,
it&#39;s necessary to understand how the as-designed functionality
works in order to build anti-tracking countermeasures which
can&#39;t be separated from that functionality. And of course, in the case of
the audible alerts, in some cases that may not be possible to do.&lt;/p&gt;
&lt;p&gt;Worse yet, we already have a giant installed base of devices
which don&#39;t have any kind of attestation, and presumably
vendors want them to continue to work. This means that
even if we were to deploy a system with this kind of attestation today,
attackers could still exploit it by pretending to be one of those
old devices.&lt;/p&gt;
&lt;h2 id=&quot;the-status-of-this-specification&quot;&gt;The Status of this Specification &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#the-status-of-this-specification&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is slightly off topic from the technical content of this
post, but I think it&#39;s important to observe that this isn&#39;t
an IETF specification. There has been some confusion on this
point, in part due to Apple&#39;s misleading &lt;a href=&quot;https://www.apple.com/newsroom/2023/05/apple-google-partner-on-an-industry-specification-to-address-unwanted-tracking/&quot;&gt;PR statement&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The specification has been submitted as an Internet-Draft via the Internet Engineering Task Force (IETF), a leading standards development organization. Interested parties are invited and encouraged to review and comment over the next three months. Following the comment period, Apple and Google will partner to address feedback, and will release a production implementation of the specification for unwanted tracking alerts by the end of 2023 that will then be supported in future versions of iOS and Android.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;whats-an-rfc%3F&quot;&gt;Whats an RFC? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#whats-an-rfc%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;RFC stands for &amp;quot;Request For Comments&amp;quot;, and dates from the prehistory
of the Internet when there wasn&#39;t a real standards process and
people would just publish memos describing protocols. The IETF
loves its traditions and &amp;quot;RFC&amp;quot; is now an important brand
(so much so that other organizations such as the
&lt;a href=&quot;https://www.rust-lang.org/&quot;&gt;Rust Project&lt;/a&gt;
now publish standards &amp;quot;RFCs&amp;quot; even though they have no
connection to the IETF process. To make matters worse,
there are also RFCs published in the same series as
IETF RFCs that aren&#39;t standards, including those published
in what&#39;s called the &lt;a href=&quot;https://www.rfc-editor.org/about/independent/&quot;&gt;Independent Stream&lt;/a&gt;,
which don&#39;t have any standards status and are just approved
by a single Independent Submissions Editor.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;That&#39;s pretty carefully worded, but it certainly gives the
impression that Apple and Google want to standardize this
work. The quote from Erica Olsen from the National Network
to End Domestic Violence (NNEDV) is even more explicit,
referring to these as &amp;quot;new standards&amp;quot; (and of course
this is in Apple&#39;s press release, so it&#39;s not like they
aren&#39;t aware of the context). Of course, there
are other meanings to &amp;quot;standard&amp;quot; than &amp;quot;document produced
by some Standards Development Organization&amp;quot;, but in this
context, the best you can say about this press release
is that it&#39;s misleading in a way that is very convenient
for Apple and Google, who would no doubt like the protective
cover of appearing to standardize something while in fact
acting unilaterally
to address a problem they created by acting unilaterally.&lt;/p&gt;
&lt;p&gt;Needless to say &amp;quot;two big companies submit a specification, take
comments for three months, and then do whatever they feel like&amp;quot;
is not the way that the IETF standards process works. The IETF
lets anyone &amp;quot;submit&amp;quot; a specification by posting an Internet-Draft (ID)
which is what Apple and Google have done, but those don&#39;t
have any formal status. Some IDs will be adopted by the IETF as part
of the standards process and some of those will actually
be standardized and become RFCs. This process takes much longer
than three months and involves achieving &amp;quot;rough consensus&amp;quot; of the
IETF Community, not just a few vendors.
I know that this sounds like standards inside baseball, but there
is an important point here. One of the functions of standards is
to ensure that there is widespread review from a variety of
stakeholders, who might have a different viewpoint (for instance
that actual interoperability is useful, or that you need
a different set of tradeoffs between privacy and functionality),
but the way that that works is that you need
buy-in from those stakeholders before the standards are finished.&lt;/p&gt;
&lt;p&gt;One critique you often hear is that the standards process is too slow
and that this is why industry actors need to ship first and standardize
later. The three month comment period seems to reflect that attitude
(it&#39;s certainly true that the IETF can&#39;t standardize anything in
three months). However, the decision by Apple and Google (and others!) to ship these technologies
without real public review is
one reason why we now are in a situation where they are being
&lt;a href=&quot;https://www.vice.com/en/article/y3vj3y/apple-airtags-police-reports-stalking-harassment&quot;&gt;actively misused&lt;/a&gt;,
something people have been expressing concerns about for two
years.
Apple/Google could have brought
this work to IETF—or some other standards body—at any point during
that time, but they chose not to do so, so arguments about how the
situation is now too urgent to go through a real multistakeholder
process don&#39;t really move me.&lt;/p&gt;
&lt;p&gt;I regularly work with a lot of people from Apple and Google
and those companies know how to bring work to IETF when they want to.
This isn&#39;t it.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said &lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/&quot;&gt;two years ago&lt;/a&gt;, this is a classic dual-use
technology. It&#39;s really convenient to be able to find your stuff
when you lost it, but tracking tags just don&#39;t know whether they
are attached to your stuff or other people&#39;s stuff. Trying to make
it visible when you are being tracked via this method is probably
about the best you can do, but it&#39;s also clear that it&#39;s a highly
imperfect defense. Deploying this kind of defense is made even
harder by having a large installed base of devices from multiple
mutually incompatible networks, meaning that anything we do has
to be backward compatible. It took us years to get into this
hole; it will take a lot more than three months to get out.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;[2023-05-08: Updated title.]&lt;/em&gt;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Actually a hash of the public key. &lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Heinrich et al. also report an issue in which attacker
is able to leverage temporary control of the user&#39;s
device to steal $SK_i$ and afterwards can track the
user. Apple has reportedly solved this by making
the keys harder to learn, but this is a generically
hard problem in an open system. &lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The way this would work would be that the device
encrypt the report as described above
and then encrypt it yet again for the service.
It would connect to a proxy, authenticate as a
valid device, and then send the doubly-encrypted
report. The proxy would then strip off the
reporter&#39;s identity and send it to the service,
which would remove the outer encryption layer
and store it, just as before. &lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that this would most likely not all fit into a single
packet, but you could imagine that the reporting device
would ask the tag to attest in a separate message before
reporting its position to the service. &lt;a href=&quot;https://educatedguesswork.org/posts/unwanted-tracking/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>How NATs Work, Part II: NAT types and STUN</title>
		<link href="https://educatedguesswork.org/posts/nat-part-2/"/>
		<updated>2023-04-17T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/nat-part-2/</id>
		<content type="html">&lt;p&gt;The Internet is a mess, and one of the biggest parts of that mess
is &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_address_translation&amp;amp;oldid=1147533294&quot;&gt;Network Address Translation (NAT)&lt;/a&gt;,
a technique which allows multiple devices to share the same
network address. This is part II in a series
on how NATs work and how to work with them.
In &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1&quot;&gt;Part I&lt;/a&gt; I
covered NATs and how they work. If you haven&#39;t read that post,
you&#39;ll want to go back and do so before starting this one.
This post starts to discuss
NAT traversal, covering the different types of NATs and how
to build peer-to-peer applications that still work from
behind NATs.&lt;/p&gt;
&lt;p&gt;As IP addresses became increasingly scarce, more and more of
the client devices on the Internet started to move behind
NATs. I don&#39;t have any real data here, but pretty much every
consumer level WiFi router I&#39;ve ever used is also a NAT,
sharing a single externally assigned IP address amongst all
the devices behind it. By contrast, servers typically have
stable public IP addresses.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This arrangement works reasonably well in client-server
situations because the client initiates the connections,
and so doesn&#39;t need an address/port pair that&#39;s stable
for more than the life of the connection. However, it doesn&#39;t
work for peer-to-peer applications.&lt;/p&gt;
&lt;h3 id=&quot;peer-to-peer-applications&quot;&gt;Peer to Peer Applications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#peer-to-peer-applications&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Although much of the Internet is client-server,
there are a number of more or less important peer-to-peer (P2P)
applications in which data flows directly between end-user
machines rather than via a server (as in e-mail, Web,
etc.). Some examples are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1-1 video calling&lt;/li&gt;
&lt;li&gt;File distribution (BitTorrent or IPFS)&lt;/li&gt;
&lt;li&gt;Some Web3/blockchain systems&lt;/li&gt;
&lt;li&gt;Games&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In principle, P2P systems have a number of advantages, including:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Reduced cost&lt;/dt&gt;
&lt;dd&gt;because you don&#39;t need to pay for a server somewhere. This is an
especially big deal for high-bandwidth applications like video
calling or file sharing.&lt;/dd&gt;
&lt;dt&gt;Reduced latency&lt;/dt&gt;
&lt;dd&gt;because you don&#39;t need to send traffic up to the server and then
from the server to the other side, which will generally be slower
than sending it directly.&lt;/dd&gt;
&lt;dt&gt;Censorship resistance/avoiding centralized control&lt;/dt&gt;
&lt;dd&gt;because there&#39;s no central server to attack.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;In practice, some of these advantages often come with disadvantages,
which is why you see a lot of client/server applications and not
a &lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization&quot;&gt;decentralized Web&lt;/a&gt;, but
there is still a fair bit of P2P. The application I&#39;m most familiar
with is voice and video over IP: it&#39;s moderately expensive to run
a centralized system like Meet or Zoom where you have to process
all the media, but much cheaper to run one where the endpoints
just talk to each other.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;p2p-challenges&quot;&gt;P2P Challenges &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#p2p-challenges&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The way that a Web server works is that the server operator knows
the IP address of the server and publishes it in the &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;DNS&lt;/a&gt;.
The port number is just 443 or 80 depending on whether the traffic
is encrypted or not. Unfortunately, this won&#39;t work for P2P
systems for two fairly obvious reasons (and also a number of non-obvious
ones, as we&#39;ll see below):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Machines behind the NAT don&#39;t know their own IP address. If your
machine has a public IP address, you can just look at how its
configured and know what to publish in the DNS. In managed systems,
the operators have some mechanism for assigning addresses and storing
the data in the DNS. But when you connect your laptop to the WiFi,
the IP address that the laptop sees is likely in some private
range, e.g., &lt;code&gt;10.0.0.*&lt;/code&gt;, which isn&#39;t useful for other people to
connect to unless they happen to be on your network.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Public IP addresses and ports aren&#39;t stable. In general, the
NAT will only have a single public IP address, so that&#39;s reasonably
stable (though see &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#ip-address-assignment&quot;&gt;here&lt;/a&gt;), but the port
is not. As I mentioned
&lt;a href=&quot;https://educatedguesswork.org/posts/nat-intro-1&quot;&gt;previously&lt;/a&gt;, the NAT creates a binding in
response to outgoing traffic and then deletes it when there
isn&#39;t any traffic. As a result, even if you knew the mapping
of internal to external ports at some time in the past,
that mapping may no longer be valid.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For these reasons, clients can&#39;t just publish their IP addresses
like a server does (there is also the question of where you would
publish them, but put that aside for a moment). Instead, you need some
kind of server to help them.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-voice-over-ip-architecture&quot;&gt;Background: Voice over IP Architecture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#background%3A-voice-over-ip-architecture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Just for convenience, let&#39;s focus on voice over IP.
The diagram below shows what you might call the &amp;quot;reference architecture&amp;quot;
for a voice or video over IP system like you might build with &lt;a href=&quot;https://educatedguesswork.org/posts/webrtc&quot;&gt;WebRTC&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/VoIP-Architecture.png&quot; alt=&quot;VoIP Architecture&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;video-conferencing-topologies&quot;&gt;Video Conferencing Topologies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#video-conferencing-topologies&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Ironically, despite all the work that has gone into NAT traversal, many video conferencing systems, the media doesn&#39;t
actually go directly but rather goes through the server. The reason for this is
that if you have many people in the call, then the sender needs to send a copy
of their media to each other person, which means that if there are &lt;em&gt;N&lt;/em&gt; people
in the call, and their video is &lt;em&gt;M&lt;/em&gt; megabits/second, they need to send &lt;em&gt;(N-1) * M&lt;/em&gt;
megabits/second of media, which can quickly overrun a consumer Internet link.
Instead, it&#39;s conventional to use a star topology where the user sends
their media to a server which replays it to everyone else in the call. This is
expensive for the server, of course, but cheaper for the user. Some
conferencing systems do send media directly for smaller conferences to
minimize costs. Sending media directly also currently works better with
end-to-end encryption for video, though that&#39;s a problem that&#39;s
being actively worked on because you&#39;d like to have end-to-end encryption
even in large conferences where a mesh design isn&#39;t practical.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In a system like this, Alice and Bob both connect to a signaling server which
is responsible for orchestrating the calls. In the case of a traditional
VoIP system like you would design with SIP, Alice and Bob would each
have a device or an app (often called a &amp;quot;softphone&amp;quot;)
that had the actual calling logic, presented the user interface, etc.,
and would exchange SIP messages via the server. In a WebRTC system
such as Google Meet or Microsoft Teams, there is a Web server which hosts
the Web app and carries messages back and forth between the
Web browsers, even though much of the actual calling logic is built
into the browser.&lt;/p&gt;
&lt;p&gt;In either case, you would ideally like the media (i.e., the actual voice
and video) to go directly between Alice and Bob (though
see &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#video-conferencing-topologies&quot;&gt;below&lt;/a&gt;). There are two main
reasons for this. First, it is cheaper: real-time video involves
transmitting a lot of data and if Alice sends all that data to
the server and then the server sends it to Bob, then the server operator
has to pay for all that data transmission. Second, it generally
takes longer for the data to go from Alice to the server and then the server to Bob, than
it would for Alice to send the data to Bob directly, especially if,
as is relatively common, Alice and Bob are geographically close
and the server is not. But now we have to contend with NATs.&lt;/p&gt;
&lt;h2 id=&quot;stun&quot;&gt;STUN &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#stun&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted above, the first problem we have is that the client machine may
not know its own IP address. The NAT knows, of course, but there&#39;s no
universally deployed protocol for it to tell the client. Instead,
the client has to measure it directly. The standard protocol for this
is called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=STUN&amp;amp;oldid=1140524920&quot;&gt;Session Traversal Utilities for NAT (STUN)&lt;/a&gt;.
STUN works by having the client talk to some server on the Internet
(unsurprisingly called a STUN server). Typically, this server will
be provided by the calling service, and configured into the clients
somehow. For instance, WebRTC provides an &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/RTCIceServer&quot;&gt;API&lt;/a&gt; to tell the Web client which STUN server to
use.&lt;/p&gt;
&lt;p&gt;In order to discover its IP
address, the client sends the server a STUN &lt;strong&gt;Binding Request&lt;/strong&gt;,
and the server responds with the IP address and port that the server
saw (technical term: &lt;em&gt;reflexive address&lt;/em&gt;) like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/stun.png&quot; alt=&quot;STUN Binding Request&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This is how the &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc3489.html&quot;&gt;original version of STUN&lt;/a&gt;, published in 2003,
behaved. Unfortunately, it is impossible to make things foolproof
because fools are so ingenious.
As you
may recall from the discussion of &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1#application-layer-gateways&quot;&gt;Application Layer Gateways (ALGs)&lt;/a&gt;
in Part I, some NATs will rewrite messages coming in from the Internet,
rewriting the external (reflexive) address to the internal
(host) address. If you have such a NAT, what you will instead see is
a flow like below, where the client gets a response that just
contains its own local address rather than the external one.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/stun-alg.png&quot; alt=&quot;STUN with ALG interference&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This is not useful! Unfortunately, we have to traverse the
the NATs we have, not the NATs we wish we have, so a way
around this was needed. The &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc5389.html&quot;&gt;second version of STUN&lt;/a&gt;, published
in 2008, added a new way to return the reflexive address in
what is called the &lt;code&gt;XOR-MAPPED-ADDRESS&lt;/code&gt; attribute. This
attribute worked by XORing the host and port with other
values from the packet. This is pretty weak sauce as encryption
goes but it&#39;s usually good enough to break up the simple-minded pattern
matching that NAT ALGs were using at the time (the idea
here isn&#39;t to avoid NATs which know about STUN and want
to rewrite values, but just to prevent accidental breakage).
This is mostly how STUN works today.&lt;/p&gt;
&lt;p&gt;One thing that may not be immediately obvious is that you
need to do the STUN queries from the same address and &lt;em&gt;port&lt;/em&gt; that
you want to receive media on. The reason for this is that each
port you send from will have a different NAT binding, and so
if you know the binding for port &lt;strong&gt;A&lt;/strong&gt; this doesn&#39;t tell
you about the binding for port &lt;strong&gt;B&lt;/strong&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
This is the same reason why you can&#39;t just have the Web server
send you your reflexive address and use that: you&#39;re contacting
the Web server from a different port (and when this stuff
was designed, TCP rather than UDP) and so the binding that
the Web server sees doesn&#39;t help you for your media.
Instead, what you do is allocate a port to use for media, discover
the reflexive address with STUN, and send that reflexive address
to your peer, and then subsequently use that port to send
and receive media.&lt;/p&gt;
&lt;h3 id=&quot;nat-types&quot;&gt;NAT Types &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#nat-types&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If only things were that simple. There are in fact NATs for which
this will work, but many where it will not. There are two basic problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;NATs which use different mappings for different remote addresses
(and ports).
Note: As a convenience, I am going to start saying &amp;quot;address&amp;quot; when I
mean &amp;quot;address and port&amp;quot;, because the alternative is clunky.
For instance, if Alice sends packets to both Bob and Charlie,
Bob and Charlie might see different reflexive addresses (or more likely ports,
as your typical consumer NAT only has one IP address)
even if Alice uses the same local address and port.
These NATs are said to have &lt;em&gt;address-dependent mappings&lt;/em&gt;
or &lt;em&gt;address and port-depending mappings&lt;/em&gt;, depending on
which differences trigger variation. The alternative
is called &lt;em&gt;endpoint-independent mapping&lt;/em&gt;
(these terms come from &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc4787&quot;&gt;RFC 4787&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;NATs which have consistent mappings but filter packets from
addresses that the client hasn&#39;t sent to. For instance, Alice
might send a packet to Bob, creating a mapping, but if Charlie
sends a packet to Alice on the same reflexive address, the
NAT would drop it. If Alice then sends a packet to Charlie,
he will see the expected address, and if he responds to this
packet, the NAT will deliver it. These NATs are said to
have &lt;em&gt;address-dependent filtering&lt;/em&gt; or &lt;em&gt;address and port-dependent
filtering&lt;/em&gt;. The alternative is &lt;em&gt;endpoint-independent filtering&lt;/em&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The bottom line here is that there are a lot of different types of
NAT, and depending on what kind of NAT you (and the person on the
other side) have, you need to do different things in order to
establish a connection.&lt;/p&gt;
&lt;h2 id=&quot;how-to-get-through-a-nat&quot;&gt;How to get through a NAT &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#how-to-get-through-a-nat&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As a notational convenience, I&#39;m going to describe NATs using
the following abbreviations:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Behavior&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Abbreviation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Endpoint-Independent Mapping&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;EIM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Address-Dependent Mapping&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;ADM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Address and Port-Dependent Mapping&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;APM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Endpoint-Independent Filtering&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;EIF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Address-Dependent Filtering&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;ADF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Address and Port-Dependent Filtering&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;APF&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A NAT is defined by the pair of mapping and filtering behaviors,
so, for instance, EIM:APF is a NAT that has consistent mappings
across addresses but filters based on address and port.&lt;/p&gt;
&lt;p&gt;I&#39;m also going to simply addresses and ports by writing them
as &lt;code&gt;A:a&lt;/code&gt;, &lt;code&gt;A:b&lt;/code&gt;, etc. where the first letter is the address
and the second is the port. Alice&#39;s local address will always
be &lt;code&gt;A:a&lt;/code&gt; and Bob&#39;s will be &lt;code&gt;B:b&lt;/code&gt;. The STUN server&#39;s will be &lt;code&gt;S:s&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;eim%3Aeif-%E2%86%94-eim%3Aeif&quot;&gt;EIM:EIF ↔ EIM:EIF &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#eim%3Aeif-%E2%86%94-eim%3Aeif&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A NAT which has endpoint-independent behavior for both mapping and
filtering is the easiest type to traverse: it&#39;s basically like
having a public IP address except that the binding may not
be stable over long periods of time. You can traverse this
kind of NAT by just having each side publish its address and
the other side can send directly, as shown in the figure
below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-ei-ei.png&quot; alt=&quot;NAT traversal with endpoint-independent NATs&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This diagram shows about the simplest possible NAT traversal
scenario. It starts with Alice deciding to call Bob. She uses the STUN
server to discover her reflexive address by sending a Binding Request
from &lt;code&gt;A:a&lt;/code&gt;. The STUN server responds with her reflexive address:
&lt;code&gt;X:x&lt;/code&gt;. Alice then sends a message to the signaling server to initiate
the call (the details of this depend on whether you are doing WebRTC,
SIP, etc. In SIP this would be an INVITE).&lt;/p&gt;
&lt;p&gt;The signaling server notifies Bob of the incoming call. When
he decides to accept it, then he will also contact the STUN
server to discover his reflexive address (&lt;code&gt;Y:y&lt;/code&gt;). His response
to the signaling server to answer the call will include this
address. At this point, Alice and Bob know each other&#39;s addresses
and can start sending media to each others reflexive addresses,
as shown in the final block. Because the NATs have endpoint-independent
mapping, the same binding will be in effect when the
peer sends a message as they did for the STUN server, even
though the message from the peer comes from a different IP
address. Similarly, because they have endpoint-independent
filtering, the NAT will accept an incoming packet directed
to the reflexive from &lt;em&gt;any&lt;/em&gt; source.&lt;/p&gt;
&lt;p&gt;This is already a pretty complicated process, but it&#39;s
conceptually fairly simple: each side discovers its
address and sends it to the other side. If all NATs had endpoint-independent
behavior for both mapping an filtering, then we could
just stop here. Unfortunately they do not.&lt;/p&gt;
&lt;h3 id=&quot;eim%3Aeif-%E2%86%94-eim%3Aapf&quot;&gt;EIM:EIF ↔ EIM:APF &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#eim%3Aeif-%E2%86%94-eim%3Aapf&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Now let&#39;s look at the next most complicated case, in which
Alice has the same NAT as before, but Bob has a NAT
with endpoint-independent mapping but address and port-dependent
filtering, as shown in the figure below. To keep things
simple, I&#39;ve omitted the opening phases where each side
discovers their address and sends it to the other side,
just showing the media phase. Note that the early phases
look the same for every NAT type, which is part of what
makes things difficult.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-ei-apf.png&quot; alt=&quot;NAT traversal with one address filtering NAT&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Unlike in the previous setting, when Alice sends her first packet
of media to Bob, his NAT discards it. Because Bob&#39;s NAT has address and
port dependent filtering, it has an access control entry only
for the STUN server, but &lt;em&gt;not&lt;/em&gt; for Alice&#39;s address, so when
Alice&#39;s packet arrives, the NAT just drops it. By contrast,
because Alice&#39;s NAT has address-independent mapping and filtering
(as in the previous example), the packet is delivered correctly to
Alice.&lt;/p&gt;
&lt;p&gt;You might think at this point that we&#39;re just going to have media
flowing one way (from Bob to Alice), but that&#39;s not what happens:
when Bob sends his first media packet to Alice, it creates a
new access control entry in his NAT for Alice&#39;s address, so that
when Alice&#39;s &lt;em&gt;second&lt;/em&gt; packet (either a retransmit or reflecting
a later part of the media stream) arrives, it is delivered correctly:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-ei-apf2.png&quot; alt=&quot;NAT traversal with one address filtering NAT&quot; /&gt;&lt;/p&gt;
&lt;p&gt;From this point forward, you have two-way media.&lt;/p&gt;
&lt;p&gt;The following table representing Bob&#39;s NAT&#39;s state might help
visualize what&#39;s happening here.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Event&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Mapping&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Access Control List&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Start&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Address discovery&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;B:b ↔ Y:y&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;S:s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Packet 2 sent&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;B:b ↔ Y:y&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;S:s, X:x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Initially, Bob&#39;s NAT doesn&#39;t contain any mappings. After he sends
a Binding Request to the STUN server, the NAT creates a mapping
from &lt;code&gt;B:b&lt;/code&gt; to &lt;code&gt;Y:y&lt;/code&gt; and an access control entry for that mapping
associated with just the STUN server. Thus, when Alice&#39;s packet 1
comes in, it is associated with a valid mapping, but is rejected
because it doesn&#39;t match a valid access control entry. When
Bob sends his first media packet (number 2), a new access control
entry is added to the same mapping (recall that Bob can always
send outgoing packets and they just add the appropriate access
control entries). Then when Alice&#39;s packet 2 arrives, there is
an appropriate access control entry and it can be delivered.&lt;/p&gt;
&lt;p&gt;Obviously, this introduces a little latency
before media starts flowing, but given that the Internet is already
subject to packet loss anyway, this isn&#39;t necessarily that big
a deal, especially with voice and video applications which
will be sending packets every 20 milliseconds or so. It&#39;s potentially
a slightly bigger issue for reliable transport protocols if they
only send one packet at a time and have long retransmit timers,
but even then the connection will eventually be established;
it just takes a little while.&lt;/p&gt;
&lt;h3 id=&quot;eim%3Aapf-%E2%86%94-eim%3Aapf&quot;&gt;EIM:APF ↔ EIM:APF &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#eim%3Aapf-%E2%86%94-eim%3Aapf&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Now let&#39;s look at what happens when &lt;em&gt;both&lt;/em&gt; Alice and Bob have
NATs with endpoint-independent mapping but address and port-dependent
filtering. This actually behaves identically to the previous
scenario:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-apf-apf.png&quot; alt=&quot;NAT traversal with two address filtering NATs&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As before, the first packet from Alice to Bob is dropped by Bob&#39;s NAT
but on its way out it establishes the access control entry in Alice&#39;s
NAT in the opposite direction, thus allowing the next inbound packet
to pass through the NAT. Here&#39;s Alice&#39;s table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Event&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Mapping&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Access Control List&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Start&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Address discovery&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;A:a ↔ X:x&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;S:s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Packet 1 sent&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;A:a ↔ X:x&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;S:s, Y:y&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;All of this happens before the first packet from Bob arrives, and
so even though Alice does have address and port-dependent
filtering, the right access control entry is in place
before that packet is received, and so the packet is just delivered.&lt;/p&gt;
&lt;p&gt;One important feature to notice about all the scenarios we have
seen so far is that they don&#39;t depend on knowing what kind of
NAT the other side has: Alice and Bob just start transmitting
and eventually the right access control entries will be established
and the packets will flow properly. Now let&#39;s look at a scenario
where that isn&#39;t true: when one side has address and port-dependent
mapping as well as filtering.&lt;/p&gt;
&lt;h3 id=&quot;eim%3Aeif-%E2%86%94-apm%3Aapf&quot;&gt;EIM:EIF ↔ APM:APF &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#eim%3Aeif-%E2%86%94-apm%3Aapf&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Suppose we have a situation where Alice has the address-independent
mapping and filtering but address and port-dependent filtering as
before, but Bob has both address and port-dependence for both mapping
and filtering. This produces the situation shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-eif-apm.png&quot; alt=&quot;NAT failure with Address-Dependent Mapping&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As with the previous scenario, the first packet from Alice to Bob
is dropped by Bob&#39;s NAT. This actually happens for a slightly
different reason than in the previous examples. In those, there
was a valid mapping for Alice&#39;s packet, but no corresponding
access control entry. However, in this case,
because Bob&#39;s NAT has address and port-dependent mapping,
the packet from Alice (&lt;code&gt;X:x&lt;/code&gt;) to &lt;code&gt;Y:y&lt;/code&gt; doesn&#39;t match any
mapping at all.&lt;/p&gt;
&lt;p&gt;When Bob sends his first packet (2) to &lt;code&gt;X:x&lt;/code&gt;, it creates a mapping
for &lt;code&gt;X:x&lt;/code&gt; but on a different outgoing port &lt;code&gt;Y:y&#39;&lt;/code&gt;. At this
point, his NAT has the following mapping table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Local Address&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Remote Address&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;External (Reflexive) Address&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;B:b&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;S:s&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Y:y&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;B:b&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;X:x&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Y:y&#39;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Packet 2 is still deliverable to Alice because her NAT has endpoint
independent mapping and filtering. However, when Alice sends her next
packet, it still goes to &lt;code&gt;Y:y&lt;/code&gt; and not &lt;code&gt;Y:y&#39;&lt;/code&gt;, and so just gets
dropped. This one-way communication will persist throughout the
connection: every packet Bob sends uses the &lt;code&gt;X:x&lt;/code&gt; ↔ &lt;code&gt;Y:y&#39;&lt;/code&gt; mapping
and every packet Alice sends goes to &lt;code&gt;Y:y&lt;/code&gt;, so none of them will ever
be delivered. Most likely, eventually one of Alice or Bob will
get tired of not being able to communicate and hang up.&lt;/p&gt;
&lt;p&gt;This scenario is actually recoverable, but it requires some cleverness
on Alice&#39;s part. What Alice has to do is look at the source address
of packets that Bob is sending her (the &lt;em&gt;peer reflexive&lt;/em&gt; address)
and if it differs from the one that Bob sent her over the signaling
channel (the &lt;em&gt;server reflexive address&lt;/em&gt;), try sending packets to that
address instead, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-eif-apm2.png&quot; alt=&quot;NAT success with Address-Dependent Mapping and peer-reflexive addresses&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The first two packets here are the same as before, but for the
third packet, Alice switches from sending to &lt;code&gt;Y:y&lt;/code&gt; to sending
to &lt;code&gt;Y:y&#39;&lt;/code&gt;. This corresponds to a mapping on Bob&#39;s NAT (and also
an entry in his access control table) and so the packet will
be delivered as expected. From here on, things work normally.&lt;/p&gt;
&lt;p&gt;This technique works, but
it requires care to use correctly. For instance, consider what
happens if an attacker sends a bogus packet from a different
address:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-prflx-attack.png&quot; alt=&quot;Attack on peer reflexive switching&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If Alice is naive, she will notice that Bob seems to have switched
his address and just switch to sending to the attacker at &lt;code&gt;Z:z&lt;/code&gt;.
Importantly, this attack can be mounted
by an attacker who cannot read packets en route from Alice to
Bob; he just needs to know the address and port Alice expects
packets on.
In the best
case, if encryption is in use, then the attacker won&#39;t be able
to read the packets but he will have disrupted the connection. In
the worst case, if encryption is not in use—and when
all this stuff was designed, VoIP encryption was fairly rare—the
attacker will be able to listen in on the Alice → Bob side of the call. Depending on Bob&#39;s
NAT configuration (e.g., if it&#39;s actually endpoint-independent),
the attacker may even be able to do so without noticeably
disrupting the call, by forwarding the packets to Bob.
There are a number of defense against this form of attack, as
we&#39;ll see in the next post.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;eim%3Aapf-%E2%86%94-apm%3Aapf&quot;&gt;EIM:APF ↔ APM:APF &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#eim%3Aapf-%E2%86%94-apm%3Aapf&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Let&#39;s look at one more case, in which Bob has the same
APM:APF NAT as before but Alice has a NAT that does
address and port-dependent filtering. This produces
the result shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-apf-apm.png&quot; alt=&quot;Deadlock with APF &amp;lt;-&amp;gt; APM&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As in the previous example, Alice&#39;s packet gets dropped
by Bob&#39;s NAT because there is no corresponding mapping
for &lt;code&gt;X:x&lt;/code&gt;, only one for the STUN server. However, unlike
the previous example, Bob&#39;s packet is also dropped, because
there is no corresponding access control entry. As you&#39;ll
recall from above, Alice has the following mapping and
access control entries:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Mapping&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Access Control List&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;A:a ↔ X:x&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;S:s, Y:y&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;However, because the incoming packet is coming from &lt;code&gt;Y:y&#39;&lt;/code&gt;
and &lt;em&gt;not&lt;/em&gt; &lt;code&gt;Y:y&lt;/code&gt;, Alice&#39;s NAT discards it. And because
Alice never gets packet 2, she is unable to change the
destination of her packets to Bob&#39;s peer reflexive address &lt;code&gt;Y:y&#39;&lt;/code&gt;
and so just keeps transmitting packets to &lt;code&gt;Y:y&lt;/code&gt; which Bob&#39;s
NAT drops because it does not have a corresponding mapping.
Similarly, Bob keeps transmitting packets to Alice,
which her NAT drops because it doesn&#39;t have the
corresponding access control entry.&lt;/p&gt;
&lt;p&gt;What we have here is deadlock: Bob can&#39;t receive
packets from Alice until she adjusts the address she is
sending to, and Alice can&#39;t receive packets from Bob
(thus learning about the new address) until she has
sent one to the new address (thus creating the access
control entry). The result is both sides transmitting
and neither side receiving. This is not a recoverable
situation.&lt;/p&gt;
&lt;h4 id=&quot;relays&quot;&gt;Relays &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#relays&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Getting out of this hole requires the use of a relay
server. More on this later, but briefly a relay
is some server on the public Internet that Bob can
send his traffic through. Because this relay is something
that Bob explicitly uses and has a relationship with—unlike
his NAT, which just does whatever it does—it can have
deterministic properties which facilitate NAT traversal.
For instance, the most common relaying protocol,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Traversal_Using_Relays_around_NAT&amp;amp;oldid=1115742687&quot;&gt;Traversal Using Relays Around NAT (TURN)&lt;/a&gt;,
provides endpoint-independent mappings and so effectively
fixes the problem we see in this section.&lt;/p&gt;
&lt;p&gt;As with STUN servers, it&#39;s conventional for the calling
provider to provide a TURN server—indeed, they are
typically the same endpoint. However, TURN is much more
expensive to provide than STUN, so ideally if its possible
for two endpoints to communicate without using TURN, you
want them to do so. Fortunately, most client pairs can
communicate without TURN, so it&#39;s still cheaper to operate
a calling service that tries to send data peer-to-peer than
one that sends everything through a central conferencing
server, as long as you use non-TURN where possible. We&#39;ll
discuss how to do this in the next part of this series.&lt;/p&gt;
&lt;h2 id=&quot;hairpinning&quot;&gt;Hairpinning &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#hairpinning&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There&#39;s one more scenario I want to cover here, which is what&#39;s
called &amp;quot;hairpinning&amp;quot;. Consider the case where Alice and Bob are
actually on the same network, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Hairpinning-setup.png&quot; alt=&quot;Alice and Bob on the same network&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Recall that the NAT has two addresses, the internal address
(&lt;code&gt;10.0.0.1&lt;/code&gt;) which Alice and Bob communicate with and the external one
(&lt;code&gt;192.0.2.1&lt;/code&gt;) that is used communicate to the outside world. Just as
in the scenarios before, Alice and Bob can connect to the STUN server
and get their server reflexive addresses.  For example, Alice might
get &lt;code&gt;192.0.2.1:1111&lt;/code&gt; and Bob &lt;code&gt;192.0.2.1:2222&lt;/code&gt; (naturally, these
use the NAT&#39;s external address).
The problem comes when Alice tries to send a packet from inside
the network to Bob&#39;s external address. If the NAT handles this
properly, it will deliver this packet (technical term: &lt;em&gt;hairpinning&lt;/em&gt;)
but some NATs do not do so, and will just drop the packet.
In this case, Alice and Bob will
not be able to communicate.&lt;/p&gt;
&lt;p&gt;Of course, Alice and Bob can communicate directly using
their local addresses in the &lt;code&gt;10.0.0.*&lt;/code&gt; space, but it&#39;s hard
for them to detect this case because many different networks use
those addresses (that&#39;s the point of RFC 1918 addresses, after
all). They could look to determine if they have the same
server-reflexive address, but that might or might not
be a reliable indicator, depending on what kind of NATs are
in use. For instance, Alice and Bob might have their own NATs
but &lt;em&gt;also&lt;/em&gt; be behind a carrier grade NAT that causes them
to have the same address. In this case, they will probably
not be able to communicate directly.&lt;/p&gt;
&lt;p&gt;Ideally, NATs would properly support hairpinning
(this is what RFC 4787 &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc4787#section-6&quot;&gt;recommends&lt;/a&gt;), but, as we&#39;ve seen throughout this series, NAT behavior
is inconsistent and the endpoints have no good way of
asking the NAT what it does.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sean-bean-nat.jpg&quot; alt=&quot;Sean Bean meme&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-a-mess&quot;&gt;What a mess &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-2/#what-a-mess&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Back in 2003 when STUN was first being developed, the idea was that
you would characterize your NAT. &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc3489&quot;&gt;RFC 3489&lt;/a&gt;
had a whole &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc3489#section-10.2&quot;&gt;algorithm&lt;/a&gt; you used that involved multiple STUN
queries and tried to determine what your network
configuration was
(remember that you can&#39;t ask it any questions, you have to measure).
The RFC described a whole menagerie of different NAT types (&amp;quot;full
cone&amp;quot;, &amp;quot;restricted cone&amp;quot;, &amp;quot;port restricted cone&amp;quot;, and
&amp;quot;symmetric&amp;quot;),&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
with the idea that you would classify your NAT according to
one of these types. Based on what kind of NAT you had, you could then provide an
appropriate address to the other side—or, in the case of
the worst type (&amp;quot;symmetric NAT&amp;quot;) potentially declare failure.
This turned out not to work very well, in part because the
ecosystem was just a lot more complicated than people expected.
In the words of the &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc5389&quot;&gt;revised STUN RFC&lt;/a&gt;:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;STUN was originally defined in RFC 3489 [RFC3489].  That
specification, sometimes referred to as &amp;quot;classic STUN&amp;quot;, represented
itself as a complete solution to the NAT traversal problem.  In that
solution, a client would discover whether it was behind a NAT,
determine its NAT type, discover its IP address and port on the
public side of the outermost NAT, and then utilize that IP address
and port within the body of protocols, such as the Session Initiation
Protocol (SIP) [RFC3261].  However, experience since the publication
of RFC 3489 has found that classic STUN simply does not work
sufficiently well to be a deployable solution.  The address and port
learned through classic STUN are sometimes usable for communications
with a peer, and sometimes not.  Classic STUN provided no way to
discover whether it would, in fact, work or not, and it provided no
remedy in cases where it did not.  Furthermore, classic STUN&#39;s
algorithm for classification of NAT types was found to be faulty, as
many NATs did not fit cleanly into the types defined there.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Instead, the IETF devised a solution which was intended to work
with any NAT type by the time honored technique of trying
a lot of stuff and seeing what works. That solution is called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Interactive_Connectivity_Establishment&amp;amp;oldid=1121894789&quot;&gt;Interactive Connectivity Establishment (ICE)&lt;/a&gt;,
and I&#39;ll be covering it in Part III.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
They may &lt;em&gt;also&lt;/em&gt; be NATted, but that&#39;s an operational
convenience, because you still need a stable public
IP. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are also reasons why centralized videoconferencing systems
are good, but that&#39;s another post. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Except that it it won&#39;t be the same as for &lt;strong&gt;A&lt;/strong&gt;, at
least for the same server. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that some of the obvious defenses don&#39;t work. For instance,
you can&#39;t just &amp;quot;latch&amp;quot; to the first packet you see because
the attacker might be faster. Similarly, you can&#39;t just
compare the peer reflexive address to the address Bob sent
over the signaling channel because if Bob has address and port-dependent
mapping, then the true peer reflexive address will also not match. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This was before people came up with this
&amp;quot;endpoint-independent&amp;quot;, &amp;quot;address-dependent&amp;quot;, etc. taxonomy. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Which also includes a totally different expansion for STUN.
In RFC 3489, STUN stood for &amp;quot;Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)&amp;quot;
and now it stands for &amp;quot;Session Traversal Utilities for NAT&amp;quot;.
The IETF loves its acronyms (and backronyms). &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-2/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Everything you never knew about NATs and wish you hadn&#39;t asked</title>
		<link href="https://educatedguesswork.org/posts/nat-part-1/"/>
		<updated>2023-04-03T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/nat-part-1/</id>
		<content type="html">&lt;p&gt;The Internet is a mess, and one of the biggest parts of that mess
is &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_address_translation&amp;amp;oldid=1147533294&quot;&gt;Network Address Translation (NAT)&lt;/a&gt;,
a technique which allows multiple devices to share the same
network address. In this series of posts, we&#39;ll be looking at NATs and
NAT traversal. This post is on NATs and the next one will be
on NAT traversal techniques.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;background%3A-ip-addresses-and-ip-address-exhaustion&quot;&gt;Background: IP addresses and IP address exhaustion &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#background%3A-ip-addresses-and-ip-address-exhaustion&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You may recall from previous posts that the Internet is a
&lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#background%3A-a-packet-switching-network&quot;&gt;packet switching network&lt;/a&gt;
which works by routing self-contained messages
(&lt;em&gt;datagrams&lt;/em&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/IP-packet.png&quot; alt=&quot;IP Packet&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;writing-ip-addresses&quot;&gt;Writing IP Addresses &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#writing-ip-addresses&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;IPv4 addresses are 32 bits, hence 4 bytes. It&#39;s conventional
to write them in what&#39;s called &amp;quot;dotted quad&amp;quot; format, which
consists of writing each byte value (from 0 to 255) separately,
followed by a dot. For instance, &lt;code&gt;10.0.0.1&lt;/code&gt; corresponds to
the bytes &lt;code&gt;0x0a 0x00 0x00 0x01&lt;/code&gt;.
Because IPv6 addresses are so much longer, writing them
is unfortunately kind of a pain, and you end up with
goofy stuff like &lt;code&gt;2607:f8b0:4002:c03::64&lt;/code&gt; (for &lt;code&gt;google.com&lt;/code&gt;)
where the &lt;code&gt;::&lt;/code&gt; means that everything in between is a 0.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Each packet has a source and destination address, which are just
numbers, and each device has its own address, which is how packets get
sent (routed) to it and not to other devices. In the original version of the
Internet Protocol (IP version 4 or just IPv4), these addresses were 32
bits long, which means that there are a total of 2&lt;sup&gt;32&lt;/sup&gt;
(about 4 billion) possible addresses.  There are rather more than 4
billion people on the planet and many of them have more than one
device, so it&#39;s not actually possible for each device to have a unique
address.&lt;/p&gt;
&lt;p&gt;This problem has been known about for more than 30 years, and the the
&lt;a href=&quot;https://www.ietf.org/&quot;&gt;Internet Engineering Task Force (IETF)&lt;/a&gt;, which
maintains most of the main networking protocols on the Internet, has
an official fix, which is for everyone to upgrade to a new version of
IP called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=IPv6&amp;amp;oldid=1147144394&quot;&gt;IP version 6
(IPv6)&lt;/a&gt;.
IPv6 has 128 bit addresses, which, at least theoretically, means
that there are plenty of addresses.
Unfortunately, for reasons which are far too long—and
depressing—to fit into this post, the transition to IPv6 has not
gone well, with the result that over 25 years after IPv6 was first
specified, significantly less than half of the Internet traffic is
IPv6. The graph below shows Google&#39;s measurements of the fraction
of its traffic that is IPv6, reflecting client-side deployment.
Server-side deployment is also fairly bad, with ISOC
&lt;a href=&quot;https://pulse.internetsociety.org/technologies&quot;&gt;reporting&lt;/a&gt;
that about 44% of the top 1000 sites support IPv6.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ipv6-clients-google.png&quot; alt=&quot;Google IPv6&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: &lt;a href=&quot;https://www.google.com/intl/en/ipv6/statistics/&quot;&gt;Google&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;This is, needless to say, not good. As a comparison point, TLS 1.3
shipped in 2018 and at this point ISOC&#39;s numbers show 79% support
among the top 1000 sites. At some level this is a slightly unfair
comparison because transitioning to IPv6 means changing your
network connection whereas transitioning to TLS 1.3 just requires
updating your software, but in any case, we&#39;re nowhere near
full IPv6 deployment, even though we no longer have enough
IPv4 addresses. Actually, addresses have been scarce for quite some
time, as shown in the timeline below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ipv4-exhaustion.png&quot; alt=&quot;IPv4 exhaustion&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: Michael Bakni via &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:IPv4_exhaustion_time_line-en.svg&quot;&gt;Wikipedia&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;IP addresses are centrally assigned, with the overall pool
being managed by the &lt;a href=&quot;https://www.iana.org/&quot;&gt;Internet Assigned Numbers Authority (IANA)&lt;/a&gt;
which provides them to &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Regional_Internet_registry&amp;amp;oldid=1142945364&quot;&gt;Regional Internet Registries (RIRs)&lt;/a&gt;, which then hand them
out to network providers, on down to hosts.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
IANA allocated its
last block to the RIRs back in 2010, but addresses were already
starting to get scarce before then. As you can see on the chart
above, an immediate transition to IPv6 in which we just turn off
IPv4 is implausible today but was
out of the question in the early 2000s back when deployment was effectively
zero. Another technical solution was needed, one that would
be incrementally deployable rather than simultaneously replacing
big chunks of the Internet (technical term: &lt;em&gt;forklift upgrade&lt;/em&gt;).
And the Internet delivered in the form of NAT.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/goldblum-internet.jpg&quot; alt=&quot;The Internet finds a way&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;network-address-translation-(nat)&quot;&gt;Network Address Translation (NAT) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#network-address-translation-(nat)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic idea behind NAT is simple: you can have multiple machines
share the same address as long as there is a way to &lt;em&gt;demultiplex&lt;/em&gt;
(i.e., separate out) traffic associated with one machine from traffic associated with
another. Fortunately, such a mechanism already existed: &lt;strong&gt;ports&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id=&quot;port-numbers&quot;&gt;Port Numbers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#port-numbers&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Consider the case where you just have two computers, a client and a
server, but where there are two simultaneous users on the client.
This feels like an odd situation in 2023 when basically all computers
are individual, but all of this stuff was designed back in an era when
multiple users &lt;em&gt;timesharing&lt;/em&gt; on the same computer was the norm. If
both users want to connect to the same server, they will have the same
IP address, so how does the server tell them apart?&lt;/p&gt;
&lt;p&gt;The answer is to have another field, the &lt;strong&gt;port number&lt;/strong&gt;, which is
is just a 16-bit integer that can be used to distinguish multiple
contexts on the same device (IP address). Port numbers have two
main uses:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;on clients&lt;/dt&gt;
&lt;dd&gt;to distinguish multiple similar processes connecting to the same
server.&lt;/dd&gt;
&lt;dt&gt;on servers&lt;/dt&gt;
&lt;dd&gt;to distinguish multiple different services. Conventionally,
services will have specific assigned port numbers, such as
80 for HTTP, 443 for HTTPS, etc.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Port numbers don&#39;t exist at the IP layer but rather at the TCP
or UDP layers, but virtually all the traffic we&#39;ll be talking
about uses UDP or TCP, so that&#39;s usually not an issue.&lt;/p&gt;
&lt;h3 id=&quot;nat&quot;&gt;NAT &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#nat&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Port numbers allow two users on the same machine to share an
IP address. The intuition behind NAT is that you can use the
same mechanism to allow two &lt;em&gt;machines&lt;/em&gt; to share an IP address,
as long as you can ensure that they won&#39;t also try to use the
same port.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
The basic way to do this is by having the network
gateway device (e.g., your WiFi router) do the work.
The basic scenario is shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/NAT-basic.png&quot; alt=&quot;A basic NAT scenario&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this example, Alice and Bob are both on the same network
and have addresses &lt;code&gt;10.0.0.3&lt;/code&gt; and &lt;code&gt;10.0.0.2&lt;/code&gt; respectively.
The WiFi router has two addresses, one on the inside which
it uses to talk to Alice and Bob (&lt;code&gt;10.0.0.1&lt;/code&gt;) and one
on the outside which it uses to talk to machines on the
Internet (&lt;code&gt;192.0.2.1&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nat-flow.png&quot; alt=&quot;NAT rewriting&quot; /&gt;&lt;/p&gt;
&lt;p&gt;When Alice wants to talk to the server, she sends a packet
from her IP address and uses local port &lt;code&gt;1111&lt;/code&gt; (this is
usually written &lt;code&gt;10.0.0.3:1111&lt;/code&gt;), as shown above. This packet gets sent
to the WiFi router, which &lt;em&gt;rewrites&lt;/em&gt; the source address and port
to &lt;code&gt;192.0.2.1:1234&lt;/code&gt; and sends it along to the server.
When the server responds, it sends the packet to
&lt;code&gt;192.0.2.1:1234&lt;/code&gt; (this is the only address that it knows),
which routes it back to the WiFi router. The router
duly rewrites the destination address to &lt;code&gt;10.0.0.3:1111&lt;/code&gt;
and sends it to Alice. The story is the same for Bob
(he even uses the same port number!)
except that the packets he sends are from
&lt;code&gt;192.0.2.1:5678&lt;/code&gt;. In order to make this work, the router
needs to maintain a mapping table of which &lt;em&gt;external&lt;/em&gt;
ports correspond to which internal machines. Each
entry in the table is called a &amp;quot;NAT binding&amp;quot;
and associates the external address and port to the internal
one.&lt;/p&gt;
&lt;p&gt;From the server&#39;s perspective, this looks exactly the same
as if there were a single machine with address &lt;code&gt;192.0.2.1&lt;/code&gt;
talking to it; NAT is just something that happens unilaterally
on the client side. This is a very important feature because
it enables incremental deployment. A network that can&#39;t
get enough IP addresses can use NAT without any change
on the servers. Perhaps less obviously, it doesn&#39;t require
changing the &lt;em&gt;clients&lt;/em&gt; either: they just use their
ordinary IP addresses and the NAT translates them.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;NAT isn&#39;t magic, of course, and it can&#39;t create IP addresses
out of nowhere; what it does is &lt;em&gt;stretch&lt;/em&gt; them by using
the port number as an extension of the IPv4 address space.
In fact, we used to joke about the IPv7 packet header, in
which the IPv4 address fields were the &amp;quot;high order&amp;quot; bits
of the address and the transport port fields were
the &amp;quot;low order&amp;quot; bits:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ipv7-header.png&quot; alt=&quot;IPv7 header&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s still possible to run out of ports on the NAT device
if it has enough clients behind it, but because
the NAT can use the same port to talk to two different server
at the same time (though this turns out to be bad news
for reasons we&#39;ll get into below) and there are around 65000 possible ports,
you need a lot of clients to want to concurrently
talk to the same server before this becomes a problem.
As a general matter, NATs will &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#binding-lifetimes&quot;&gt;reuse&lt;/a&gt; ports once they
are no longer active, so that NAT bindings aren&#39;t
stable over time: port &lt;code&gt;1234&lt;/code&gt; might be Alice now but
Bob in 20 minutes.&lt;/p&gt;
&lt;p&gt;As a practical matter, you don&#39;t usually use NAT for servers,
at least not this way, though it&#39;s not technically impossible. In particular,
HTTP(S) URIs have a port number field, so you can
say (for instance) &lt;code&gt;https://example.com:4444&lt;/code&gt; to
indicate that the client should use port &lt;code&gt;4444&lt;/code&gt; but this
just isn&#39;t common practice, partly because the result
is ugly and partly because there are other mechanisms
for sharing multiple servers on the same client, such
as TLS Server Name Indication (SNI).&lt;/p&gt;
&lt;h3 id=&quot;rfc-1918-addresses&quot;&gt;RFC 1918 Addresses &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#rfc-1918-addresses&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Of course, even if they are behind a NAT, each client still needs its
own IP address, so how does this help? The answer is that
these addresses don&#39;t need to be &lt;em&gt;globally&lt;/em&gt; unique but
just &lt;em&gt;locally&lt;/em&gt; unique within a given network. This means
that the local address of a machine on your network might
be the same as one on my network, but they get translated
to different addresses on the public Internet.&lt;/p&gt;
&lt;p&gt;The IETF has reserved a number of address blocks for
&amp;quot;Private&amp;quot; usage in &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc1918.html&quot;&gt;RFC 1918&lt;/a&gt;.
These addresses are never supposed to appear on the public
Internet and so it&#39;s safe to use them on your network,
as long as you translate them to a routable address
on the way out to the Internet. The example above
uses addresses from one such address block: &lt;code&gt;10.0.0.0/8&lt;/code&gt;, which
means &amp;quot;all the addresses with the 8-bit prefix &lt;code&gt;10&lt;/code&gt;, i.e.,
&lt;code&gt;10.0.0.0&lt;/code&gt; to &lt;code&gt;10.255.255.255&lt;/code&gt; inclusive. This block
has around 16 million possible addresses in it, so you
can have a very large network behind a NAT.&lt;/p&gt;
&lt;h2 id=&quot;maintaining-nat-bindings&quot;&gt;Maintaining NAT Bindings &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#maintaining-nat-bindings&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Internally, a NAT needs to keep a mapping table that stores the
bindings between internal and external addresses. In the example
above, you would have a table something like:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Internal Address&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;External Port&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.3:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1234&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10.0.0.2:1111&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5678&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note that the external address is constant, so we don&#39;t need
it in the table. Some larger NAT systems (see
&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#carrier-grade-nat&quot;&gt;carrier-grade nat&lt;/a&gt; below) have multiple external
IP addresses, but we don&#39;t need to worry about that right now.&lt;/p&gt;
&lt;p&gt;When the NAT receives a packet on the outgoing interface, it
needs to do a table lookup. If a binding already exists for
the packet, then the NAT just uses the entry in the table.
If no binding exists, it creates a table entry with an unused
port and forwards the packet. In this example, I&#39;ve described
what&#39;s called an &amp;quot;address-independent&amp;quot; NAT in which you
have a single binding for a given local address/port combination,
no matter what the remote address is. There are also &amp;quot;address-dependent&amp;quot;
NATs, which use a different binding. This will become
relevant when we talk about NAT traversal in Part II.&lt;/p&gt;
&lt;p&gt;When the NAT receives an incoming packet on the external interface,
it also does a table lookup. If a table entry exists, it forwards
the packet as expected, but if no entry exists then there&#39;s no way of
knowing which host the packet is intended for; the sensible thing
to do in this case is to just drop the packet. The result of this
is that most consumer NATs only really support flows in which
the machine behind the NAT speaks first to initiate the flow. This
is usually conceptualized as an &amp;quot;outgoing-only&amp;quot; set of semantics
and corresponds well to &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro&quot;&gt;TCP connections&lt;/a&gt;,
in which the client sends the first packet (a SYN). Indeed, some
NATs rely on the TCP SYN to create bindings, and will just
drop mid-connection TCP packets that correspond to unknown flows.
This doesn&#39;t work with UDP so you just have to look at the first
outgoing packet, ignoring whatever markings it has.&lt;/p&gt;
&lt;p&gt;This &amp;quot;outbound connections only&amp;quot; semantic is often viewed as a security
feature because it means that even if you have devices behind the
NAT that have &amp;quot;open TCP ports&amp;quot;, meaning that they listen on
those ports for connections, external attackers may not be able
to connect to them. This kind of device is surprisingly common,
especially for things like printers or scanners which you want
to be accessible to anyone on the local network, so a NAT is really
providing a valuable function here. However, it&#39;s important to
realize that unlike a firewall, which is explicitly designed to
block certain kinds of connections, many NATs just do this as
a sort of accidental side effect of their architecture—although others do so explicitly,
as we&#39;ll see later—so it&#39;s not a guaranteed property that
you should rely on.&lt;/p&gt;
&lt;h3 id=&quot;binding-lifetimes&quot;&gt;Binding Lifetimes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#binding-lifetimes&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This brings us to the obvious question of when the NAT should delete
bindings. Cleaning up old bindings is an important function because
otherwise the NAT would quickly use up its available port space.
There are a number of ways to manage this:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Keep the binding open until the connection is torn down,&lt;/dt&gt;
&lt;dd&gt;either by a TCP FIN or a TCP RST. This doesn&#39;t work with many UDP-based
protocols, which either don&#39;t have messages indicating connection
closing (such as RTP) or where those messages are encrypted
(such as QUIC or DTLS 1.3). This method also isn&#39;t sufficient
even for TCP, because the client might have shut down without
sending a FIN, for instance if it crashed or the user put
their laptop to sleep.&lt;/dd&gt;
&lt;dt&gt;Use a timeout&lt;/dt&gt;
&lt;dd&gt;and tear down connections which are idle for too long. This guarantees
that eventually the resources will be released, because if the
client shuts down, it won&#39;t be sending packets. However,
&amp;quot;too long&amp;quot; is just a heuristic. Network protocols are often designed
so that if there is no data flowing they don&#39;t send any packet (TCP
is this way), in which case you may just be tearing down a connection
right as the client was about to send something.
More modern protocols incorporate &amp;quot;keepalive&amp;quot; packets to keep the NAT bindings open, but
remember that the idea here is that a NAT should work with protocols
that were designed before the NAT was deployed, so this is not
an ideal solution.&lt;/dd&gt;
&lt;dt&gt;Delete the least-recently-used connections&lt;/dt&gt;
&lt;dd&gt;once some maximum number of connections is reached and a new
one needs to be allocated. This has many of the same problems as
the timeout but is a slight improvement in some respects because
it doesn&#39;t delete old connections unless the table is full.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;It&#39;s of course also possible to use more than one of these mechanisms
at once. For instance, you might look at the TCP control packets
to drop TCP connections but use timers as a backup for client
shutdown and for other protocols.&lt;/p&gt;
&lt;h3 id=&quot;non-tcp%2Fudp-protocols&quot;&gt;Non-TCP/UDP Protocols &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#non-tcp%2Fudp-protocols&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Of course, TCP and UDP are not the only protocols which it is possible
to run on the Internet. The IP datagram&#39;s &amp;quot;next protocol&amp;quot; field is an
8-bit value and only about half of these are
&lt;a href=&quot;https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml&quot;&gt;assigned&lt;/a&gt;
so in principle it&#39;s possible to introduce new protocols that run
directly over IP. In practice, however, NATs make this extremely
problematic because the port field is not in the IP header but rather in
the header of the protocol that sits above IP (e.g., TCP or UDP),  which means that the
NAT needs protocol-specific logic for each new protocol.&lt;/p&gt;
&lt;p&gt;A good example here is &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc4960&quot;&gt;SCTP&lt;/a&gt;,
a TCP-like protocol that introduces a number of new features like
multiplexing on the same connection. SCTP was intended to run over
IP, just like TCP, and SCTP&#39;s header actually
has the source and destination ports in the same location as TCP and UDP, as
shown below:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Source Port Number        |     Destination Port Number   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      Verification Tag                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Checksum                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;firewalls&quot;&gt;Firewalls &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#firewalls&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The situation is actually much worse than I&#39;m making it out
here, because network security devices like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Firewall_%28computing%29&amp;amp;id=1145798057&amp;amp;wpFormIdentifier=titleform&quot;&gt;firewalls&lt;/a&gt; are
often configured to reject any traffic that they don&#39;t
understand. Even if a new protocol magically worked with
NATs without modification, it would be blocked by
many firewalls.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;You might think, then, that a NAT which just always
rewrote whatever bytes were in location for the source/destination
port fields for UDP or TCP would work fine with SCTP,
but that&#39;s not correct. It&#39;s true that it would rewrite
the fields, but that would just create another problem,
because the SCTP packet also includes
a &lt;em&gt;checksum&lt;/em&gt; (the last field in the header shown above)
which is computed over the entire packet and is designed to
detect any change to the packet, &lt;em&gt;including the port numbers&lt;/em&gt;.
This means that any NAT which rewrites the source and destination
port &lt;em&gt;also&lt;/em&gt; needs to rewrite the checksum, otherwise the checksum verification will
fail at the receiver and the packet will be discarded.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
The SCTP checksum is in a different place than the TCP
(or UDP checksum) and is computed using a different algorithm,
so even if you just went ahead and used the TCP rewriting
code—which isn&#39;t a good idea for other reasons—you&#39;d just
end up damaging some other part of the packet. The bottom line, then,
is that it&#39;s not safe for NATs to just rewrite packets
they don&#39;t understand (even though in some cases it might be safe), and instead
NATs need to be modified in order to support each new
protocol, which means that any such protocol starts out broken
on a huge fraction of clients, making it very hard to get traction.&lt;/p&gt;
&lt;p&gt;Fortunately there is a well-known solution to this problem,
which is to run your new protocol over UDP. The UDP header
is comparatively lightweight, consisting of 8 bytes, 4 of
which are the host and port, which you&#39;d need anyway.
The other two are a two-byte length field, which you&#39;d generally
want and a kind of outdated checksum, which only takes up
two bytes, so there&#39;s not that much overhead.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; 0      7 8     15 16    23 24    31  
+--------+--------+--------+--------+ 
|     Source      |   Destination   | 
|      Port       |      Port       | 
+--------+--------+--------+--------+ 
|                 |                 | 
|     Length      |    Checksum     | 
+--------+--------+--------+--------+ 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you run your protocol over UDP, then NATs will generally work
mostly correctly—again with the caveat that the NAT doesn&#39;t
know when a connection stops and starts—you start out
from a position of things mostly working rather than them mostly
failing (when QUIC was first rolled out, Google &lt;a href=&quot;https://storage.googleapis.com/pub-tools-public-publication-data/pdf/8b935debf13bd176a08326738f5f88ad115a071e.pdf&quot;&gt;found&lt;/a&gt;
that around 95% of connections succeeded.)
Of course, 95% isn&#39;t 100%, and experience with new protocols such
as QUIC and DTLS (with WebRTC) suggests that any new protocol will
experience some blockage; in practice this means that you need to
arrange some way to fall back to an older protocol such as HTTPS
if your new UDP-based protocol fails. There are a number of possible
approaches here, including trying both in parallel
(a technique often called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Happy_Eyeballs&amp;amp;id=1147145736&amp;amp;wpFormIdentifier=titleform&quot;&gt;Happy Eyeballs&lt;/a&gt;), trying
the new protocol first and seeing if it fails, or trying the old
protocol first and then in the background trying the new protocol.&lt;/p&gt;
&lt;p&gt;For this reason, the only really practical way to deploy new transport
protocols on the Internet is over UDP,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
and this is what recent protocols such as QUIC (running
directly over UDP) or WebRTC data channels (SCTP running over
DTLS running over UDP) do.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
This principle was forcefully
enunciated by voice over IP pioneer
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Jonathan_Rosenberg_(SIP_author)&amp;amp;oldid=1145532767&quot;&gt;Jonathan Rosenberg (JDR)&lt;/a&gt; in an IETF session where someone was presenting
a mechanism for running SCTP over NATs. JDR&#39;s response was something to
the effect of:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There are some hard truths in the world and this is one of them. TCP and UDP are the
&lt;a href=&quot;https://www.ietf.org/archive/id/draft-rosenberg-internet-waist-hourglass-00.html&quot;&gt;new waist of the IP protocol stack&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this context, &amp;quot;waist&amp;quot; refers to a famous analogy for the IP protocol
suite illustrated by this image from a talk by IPv6 designer Steve Deering:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ip-waist-deering.png&quot; alt=&quot;IP hourglass&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: &lt;a href=&quot;https://www.iab.org/wp-content/IAB-uploads/2010/11/hourglass-london-ietf.pdf&quot;&gt;Steve Deering&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;The idea is that IP can run on any kind of transport (radio, copper, whatever)
and that you can run lots of protocols on top of it, but that IP is the
common element hence the narrow &amp;quot;waist&amp;quot; of the hourglass.
Rosenberg&#39;s point (which I agree with) is that this place is
now occupied by UDP (and to a lesser extent TCP). Arguably, the situation
is worse than this: it&#39;s so common to deploy new technologies over HTTP
that I&#39;ve seen arguments that &lt;a href=&quot;https://book.systemsapproach.org/e2e/trend.html&quot;&gt;HTTP is the new waist&lt;/a&gt;, but we&#39;re not there yet!&lt;/p&gt;
&lt;h2 id=&quot;application-layer-gateways&quot;&gt;Application-Layer Gateways &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#application-layer-gateways&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;NAT works quite well for simple protocols which just consist of one
connection (e.g., HTTP). However, there are some protocols which
have a more complicated pattern. As an example, the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=File_Transfer_Protocol&amp;amp;oldid=1141392671&quot;&gt;File Transfer Protocol (FTP)&lt;/a&gt; is part
of the original protocol suite and was widely used for downloading
data prior to the dominance of the Web and HTTP. FTP had
an unusual (to modern eyes) design which used two connections:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;A control channel&lt;/dt&gt;
&lt;dd&gt;which the client used to give instructions to the server.&lt;/dd&gt;
&lt;dt&gt;A data channel&lt;/dt&gt;
&lt;dd&gt;which was used to actually transmit data.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;A download using FTP &lt;em&gt;[edited from &amp;quot;UDP&amp;quot; — 2023-04-17]&lt;/em&gt; looks like the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ftp.png&quot; alt=&quot;FTP Transfer&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The client would first connect to the FTP server and then issue instructions
about what file to download. The server would then connect to the client
(by default using the port number one lower than the one the client used,
but the client can provide a port number) and send the file.&lt;/p&gt;
&lt;p&gt;Of course, this won&#39;t necessarily work if you have a NAT, because
the port number probably won&#39;t be right; even if the client uses
the default, the NAT might not have two adjacent ports spare. Instead,
the NAT would use what&#39;s called an &lt;em&gt;application-layer gateway (ALG)&lt;/em&gt;
and &lt;em&gt;rewrite&lt;/em&gt; the client&#39;s request, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ftp-alg.png&quot; alt=&quot;FTP with a NAT ALG&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;an-aggressive-alg&quot;&gt;An aggressive ALG &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#an-aggressive-alg&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Sometimes ALGs aren&#39;t so careful, however. The FTP ALG only works
because the NAT knows about FTP, but what about unknown protocols?
One possible implementation is to just pattern match by replacing
any occurrence of the IP address (e.g., &lt;code&gt;10.0.0.1&lt;/code&gt;) or the
IP address and port (e.g., &lt;code&gt;10.0.0.1:1111&lt;/code&gt;) with the NAT&#39;s
address and port (and maybe even make a new NAT binding to
go along with it.) This is a general mechanism but also a brittle
one. In one hilarious case, &lt;a href=&quot;https://www.linkedin.com/in/adamroach1/&quot;&gt;Adam Roach (another VoIP pioneer)&lt;/a&gt;
was trying to download a Linux disk image and kept getting checksum
errors.&lt;/p&gt;
&lt;p&gt;He eventually tracked it down by comparing the right image
and the one he was getting and found a 4 byte difference, where
the right value corresponded to his public IP address and the
value he was getting was his internal address. What
was happening was that the ALG in the NAT was just
&lt;em&gt;rewriting&lt;/em&gt; anything that looked like his external IP into
his internal IP, regardless of where it was in the data
stream. Not good!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Note that the NAT mostly doesn&#39;t interfere with the client&#39;s data:
it just knows enough about FTP to know where the port number is,
create the appropriate incoming NAT binding, and then replace
it on the control channel. This of course won&#39;t work as well
on unknown protocols and won&#39;t work at all on encrypted
ones (in fact, any tampering with an encrypted protocol
will generally just cause some kind of failure).
At this point FTP is mostly gone (due to a combination of being insecure,
being superseded by HTTP,
at least in the case of Web browsers,
&lt;a href=&quot;https://groups.google.com/g/mozilla.dev.platform/c/FqCZUT9ay_o/m/jt4DLRDjAwAJ&quot;&gt;concerns about the quality of the implementations&lt;/a&gt;), and newer protocols don&#39;t
adopt this pattern because they want to work well with NATs.
The reason that ALGs of this kind were needed was to avoid
breaking existing protocols when NATs were first introduced, but now
that NATs are widespread, the opposite dynamic is in play
and new protocols have to avoid breaking when run over
existing NATs.&lt;/p&gt;
&lt;h2 id=&quot;carrier-grade-nat&quot;&gt;Carrier-Grade NAT &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#carrier-grade-nat&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Initially, NATs were largely deployed at the boundary of consumer
or enterprise networks (where they are now ubiquitous). However,
as IP address space got more and more scarce, ISPs found themselves
in the position where they were not able to get enough IP addresses
for each customer to have one. The solution, of course, was
to have a giant NAT (usually called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Carrier-grade_NAT&amp;amp;oldid=1137061340&quot;&gt;carrier-grade nat (CGN)&lt;/a&gt; which multiplexes
multiple subscribers onto the same IP address. Of course,
the customer may still have their own NAT, so with CGN you
can have multiple layers of NATting and address rewriting,
which of course couldn&#39;t possibly go wrong.&lt;/p&gt;
&lt;p&gt;In a CGN scenario, the addresses assigned to subscribers
can either be from unroutable address space
(either from RFC 1918 or from the new &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6598&quot;&gt;RFC 6598&lt;/a&gt;
block), or can be IPv6 addresses. In the latter case, subscribers
would just have IPv6 addresses and the NAT would rewrite things to
IPv4 on the way out the door, in a technique called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=NAT64&amp;amp;oldid=1147110056&quot;&gt;NAT64&lt;/a&gt;.
This scenario isn&#39;t as simple as with IPv4 because the network
also needs to rewrite IPv4 addresses in DNS A records to
IPv6 AAAA records (a technique called DNS64) so that the IPv6-only clients can send
to them; this comes with its own problems, but that&#39;s a topic
for another post.&lt;/p&gt;
&lt;h2 id=&quot;the-ietf-and-nat&quot;&gt;The IETF and NAT &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#the-ietf-and-nat&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For a long time, the IETF was basically in denial about NAT,
for two major reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Any packet rewriting (let alone ALGs) violates the end-to-end design
of IP in which packets just go untouched from A to B.&lt;/li&gt;
&lt;li&gt;It was seen as a technique to extend the lifetime of IPv4 when
everyone should just be transitioning to IPv6 (&lt;a href=&quot;http://acceleratethecontradictions.blogspot.com/2010/04/accelerate-contradictions-notes-towards.html&quot;&gt;sharpen the contradictions&lt;/a&gt;!)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The general attitude at the time was that standardizing NAT
behavior would just encourage it and instead one ought to ignore NATs and hope they
would go away, when the IPv6 rapture finally arrived. You can
see this attitude as late as 2012, when RFC 6598 was published
with the following statement:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A number of operators have expressed a need for the special-purpose
IPv4 address allocation described by this document.  During
deliberations, the IETF community demonstrated very rough consensus
in favor of the allocation.&lt;/p&gt;
&lt;p&gt;While operational expedients, including the special-purpose address
allocation described in this document, may help solve a short-term
operational problem, the IESG and the IETF remain committed to the
deployment of IPv6.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This all worked out about as well as you would think: NATs are everywhere
and we still don&#39;t have anything like full deployment of IPv6.
To make matters worse, in the absence of any guidance, NAT behavior
became extremely variable and idiosyncratic, leading to ever more
complicated workarounds. Eventually, in 2007, the IETF published
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc4787&quot;&gt;RFC 4787&lt;/a&gt; document
describing how NATs &lt;em&gt;ought&lt;/em&gt; to behave; by that time there were
of course a huge number of NAT deployments which didn&#39;t follow
these guidelines, though they&#39;re hopefully useful for developers of
newer devices.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nat-part-1/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;NATs provide a particularly good example of the way the Internet
evolves, which is to say workaround upon workaround. The reason for
this is what Google engineer Adam Langley calls the &amp;quot;Iron law of the
Internet&amp;quot;, namely that the last person to touch anything gets blamed.
The people who first built and deployed NATs had to avoid
breaking existing deployed stuff, forcing them to build hacks
like ALGs and unpredictable idle timeouts.
Now that NATs are widely deployed, new protocols
have to work in that environment, which forces them to run over
UDP and to conform to the outgoing-only flow dynamics dictated
by the NAT translation algorithms. Of course, there is a whole
class of applications that don&#39;t fit well into that paradigm,
in particular peer-to-peer applications like VoIP and gaming.
In the next post we&#39;ll look at techniques to make those work
anyway, even with existing NATs.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, I know I still have two unfinished series, one on
&lt;a href=&quot;https://educatedguesswork.org/tags/transport-protocols&quot;&gt;transport protocols&lt;/a&gt;
and one on &lt;a href=&quot;https://educatedguesswork.org/tags/web-security&quot;&gt;Web security&lt;/a&gt;. I got a bit distracted,
and, in the case of the transport protocol series,
a bit carried away with one of the posts, but I do plan
to get back to them. I&#39;m already partway through
Part II, so I should have that up relatively soon. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In theory IANA could just assign numbers directly,
but this allows for regional governance. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technically this mechanism is known as &amp;quot;Network Address/Port
Translation&amp;quot; (NAPT) but as this is the most common
approach, NAT is the common term. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;ve omitted one detail, which is that you need to
give the clients all new addresses from the
&lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#rfc-1918-addresses&quot;&gt;RFC 1918&lt;/a&gt; space, but in modern
networks, the client addresses are centrally assigned
by the local network,     so this is typically straightforward. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
We actually had shirts made, with the front saying
&amp;quot;32 + 16 &amp;gt; 128&amp;quot;, with the joke being that the 32
bit address + 16 bit port of IPv4 was better than
the 128-bit IPv6 address. Cafe Press seems to have
lost the design though. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
At one point there was a &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-ietf-tsvwg-natsupp/&quot;&gt;draft&lt;/a&gt; to make SCTP
work better with NATs, but it doesn&#39;t seem to have
ever been standardized. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;A big
reason to have a new transport protocol is to
have your own rate limiting and reliability mechanisms,
and that doesn&#39;t work if you run them over TCP,
which has its own mechanisms. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;NATs
aren&#39;t the only reason to deploy new protocols over
UDP. It&#39;s also helpful that you can implement new
UDP-based protocols entirely in application space rather
than by modifying the operating system. &lt;a href=&quot;https://educatedguesswork.org/posts/nat-part-1/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Architectural options for messaging interoperability</title>
		<link href="https://educatedguesswork.org/posts/dma-interop/"/>
		<updated>2023-03-10T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dma-interop/</id>
		<content type="html">&lt;p&gt;As I mentioned in some &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/&quot;&gt;previous&lt;/a&gt;
&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/&quot;&gt;posts&lt;/a&gt;, the EU
&lt;a href=&quot;https://competition-policy.ec.europa.eu/dma_en&quot;&gt;Digital Markets Act (DMA)&lt;/a&gt;
requires interoperability for
&lt;em&gt;number independent interpersonal communications services&lt;/em&gt;
(NICS), which is to say stuff like messaging
(what we used to call &amp;quot;Instant Messaging&amp;quot;) as well
as real-time media (voice and video calling). Specifically Article 7
&lt;a href=&quot;https://eur-lex.europa.eu/legal-content/EN/TXT/?toc=OJ%3AL%3A2022%3A265%3ATOC&amp;amp;uri=uriserv%3AOJ.L_.2022.265.01.0001.01.ENG&quot;&gt;says&lt;/a&gt;
that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;2.   The gatekeeper shall make at least the following basic
     functionalities referred to in paragraph 1 interoperable where
     the gatekeeper itself provides those functionalities to its own
     end users:

(a) following the listing in the designation decision pursuant to Article 3(9):
    (i) end-to-end text messaging between two individual end users;
    (ii) sharing of images, voice messages, videos and other attached
    files in end to end communication between two individual end
    users;

(b) within 2 years from the designation:
    (i) end-to-end text messaging within groups of individual end
    users;
   (ii) sharing of images, voice messages, videos and other attached
   files in end-to-end communication between a group chat and an
   individual end user;

(c) within 4 years from the designation:
   (i) end-to-end voice calls between two individual end users;
   (ii) end-to-end video calls between two individual end users;
   (iii) end-to-end voice calls between a group chat and an individual
        end user;
   (iv) end-to-end video calls between a group chat and an individual
        end user.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The European Commission (specifically the
Directorate General for Communication (DG COMM))
has been holding a series of workshops on how to structure the
compliance requirements for the DMA. Last week, I attended
the workshop on messaging interoperability to serve on
a panel along with
&lt;a href=&quot;https://www.roeslpa.de/&quot;&gt;Paul Rösler (FAU Erlangen-Nürnberg)&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/stephen-hurley-24424231/?originalSubdomain=ie&quot;&gt;Stephen Hurley (Meta)&lt;/a&gt;,
&lt;a href=&quot;https://alissacooper.com/&quot;&gt;Alissa Cooper (Cisco)&lt;/a&gt;,
and &lt;a href=&quot;https://element.io/about&quot;&gt;Matthew Hodgson (Element/Matrix)&lt;/a&gt;.
(video &lt;a href=&quot;https://webcast.ec.europa.eu/dma-workshop-2023-02-27&quot;&gt;here&lt;/a&gt;;
my presentation starts at 13:24:10; slides &lt;a href=&quot;https://blog.mozilla.org/netpolicy/files/2023/03/DMA-Workshop-Messg-Interop.pdf&quot;&gt;here&lt;/a&gt;).
Nominally the panel was about the impact of end-to-end encryption
on interoperability (see &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/&quot;&gt;here&lt;/a&gt;
for some earlier thoughts on this), but in the event it turned into more of an overall
discussion of the broader technical aspects.&lt;/p&gt;
&lt;p&gt;The rest of this post expands some on my thinking in this area.
Note that while I work for Mozilla, these are my opinions,
not theirs.&lt;/p&gt;
&lt;h2 id=&quot;overview-of-technical-options&quot;&gt;Overview of Technical Options &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#overview-of-technical-options&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At a high level, there are three main technical options
for providing this kind of interoperability, in ascending order of flexibility
for the competitor product:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Gatekeeper-provided libraries.&lt;/dt&gt;
&lt;dd&gt;The gatekeeper provides a software library (ideally in source code
form, but perhaps not) which implements their interfaces. The
competitor builds their app using that library and doesn&#39;t
have to know—and maybe doesn&#39;t get to see—any
details of those interfaces.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/dd&gt;
&lt;dt&gt;Gatekeeper-specified interfaces.&lt;/dt&gt;
&lt;dd&gt;The gatekeeper publishes a set of interfaces (Web APIs, protocols, etc.)
that competitors can use to talk to its system. The competitor implements
those APIs themselves—or maybe someone writes an open source
library to implement them—and talks to the gatekeeper&#39;s
system that way.&lt;/dd&gt;
&lt;dt&gt;Common protocols.&lt;/dt&gt;
&lt;dd&gt;The gatekeeper implements interfaces based on some common—preferably
standardized—protocol. The competitor implements that protocol and
uses it to talk to the gatekeeper.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;We&#39;ll take a look at each of these options below.&lt;/p&gt;
&lt;h2 id=&quot;requirements-scope&quot;&gt;Requirements Scope &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#requirements-scope&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The big question lurking in the background of the entire workshop was
the scope of the requirement that the EC would levy on gatekeepers.
I&#39;m not a legal expert, but the above quoted text seems to require
that the gatekeepers make this functionality available but not dictate
any particular means of doing so. In particular, they might opt to
just publish some libraries or specifications that anyone who wants to interoperate
with them must conform to (the EU technical term here is &amp;quot;reference
offer&amp;quot;). The Commission would then be responsible for ensuring that
this reference offer was compliant, which is to say that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It provides the required functionality.&lt;/li&gt;
&lt;li&gt;It is sufficiently complete to implement from.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For reasons that will occupy most of the rest of this post, this is
not really an ideal state of affairs, and it would be easier for
competitors (technical term &amp;quot;access seekers&amp;quot;) if there were some
single set of interfaces (protocols) that every gatekeeper.  However,
the tone of the workshop is that the Commission is not eager to
require a single set of standards at this stage and that there&#39;s some question about
exactly what the DMA empowers them to require in this area.
For the purposes of this post, I&#39;m going to put that question
aside and focus on the technical situation as I see it.&lt;/p&gt;
&lt;h2 id=&quot;interoperability-is-hard&quot;&gt;Interoperability is Hard &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#interoperability-is-hard&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first thing to realize is that interoperability is really difficult to
achieve, even when people are trying hard. The basic problem is that
protocol specifications tend to be fairly complicated and it is
difficult to write one that is sufficiently precise and complete
that two people (or groups) can independently construct implementations
that interoperate.  In fact, some standards
development organizations require demonstration
that every feature has two independent implementations that
can interoperate in order to advance to a specific maturity
level (in IETF, &amp;quot;Internet Standard&amp;quot;, though as a practical
matter, even many widely deployed and interoperable protocols never get this far,
just because it&#39;s a hassle to advance them).
Part of the process of refining the protocol is
finding places where the specification is ambiguous and modifying
the specification to clarify them.&lt;/p&gt;
&lt;p&gt;Over the past 10 years or so, I&#39;ve been heavily involved in the
standardization and interop testing of at least three major protocols
(and a number of smaller ones):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8446&quot;&gt;TLS 1.3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9000&quot;&gt;QUIC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API&quot;&gt;WebRTC&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In each case, we discovered issues with just
about every implementation, in many cases leading to interoperability
failure.&lt;/p&gt;
&lt;p&gt;A full description of how interop testing works is outside the
scope of this post, but at a high level each implementor sets up
their own endpoint and you try to make them communicate; in
most cases this will initially be unsuccessful. If you&#39;re lucky,
one of the implementations will emit some kind of error, but sometimes
it just won&#39;t work (e.g., you just get a deadlock with neither
side sending anything). What you do next depends on precisely
what went wrong.
As an example, if a message from implementation A elicited an error
from implementation B, then you look at the message and the error it
generated and try to determine if the message was correct (in which
case it&#39;s B&#39;s fault), if it was incorrect (in which case it&#39;s
A&#39;s fault), or if the specification is ambiguous (in which case
it needs to be updated). Once the implementors
have decided on the correct behavior, then one (or sometimes both!)
of them change their implementations, and you rerun the test, hopefully
getting a little further before things break. This process repeats
with increasingly more difficult scenarios until everything works.&lt;/p&gt;
&lt;p&gt;There are several important points to remember here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The specification is frequently unclear; even when there is a &lt;em&gt;best&lt;/em&gt;
reading, it&#39;s often not entirely obvious.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Even when the specification is clear, implementors make mistakes,
leading to interoperability problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Interop testing is a high-bandwidth process that requires
close collaboration between implementors. In particular, it&#39;s
vital to be able to understand what the other implementation
didn&#39;t like about what your implementation did, rather than
just knowing that it didn&#39;t work.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This last point is especially important: if you just send a message
to the other side and get an error, then you&#39;re left scrutinizing
your code over and over to see if you did something wrong, even
when it turns out the problem is on the other side.&lt;/p&gt;
&lt;p&gt;When I was working on WebRTC and trying to get interoperability
between Firefox and Chrome, I spent quite a few days in
Google conference rooms with &lt;a href=&quot;https://twitter.com/juberti&quot;&gt;Justin Uberti&lt;/a&gt;,
the tech lead for Chrome&#39;s WebRTC implementation, doing
just what I described above. It also helped that both Firefox
and Chrome were open source, so we were able to look at each
other&#39;s code and figure out what must be happening. Getting
this to work would have been approximately impossible if
all I had had was a copy of Chrome and no insight into what
was happening internally, or if we hadn&#39;t been right next to
each other. This problem is especially acute for
cryptographic protocols, where any error tends to lead to
some sort of opaque failure such as &amp;quot;couldn&#39;t decrypt&amp;quot; or
&amp;quot;signature didn&#39;t validate&amp;quot;. If you can&#39;t see the intermediate
computational values (e.g., the keys or the inputs to
the encryption), you&#39;re back to trying to guess what you did
wrong (and good luck if it&#39;s the other side!).&lt;/p&gt;
&lt;p&gt;More recently, the IETF has developed something of a system for
this (thanks to &lt;a href=&quot;https://research.cloudflare.com/people/nick-sullivan/&quot;&gt;Nick Sullivan&lt;/a&gt;
from Cloudflare for kicking this process), starting with TLS 1.3 and now with
QUIC. Basically, everyone gets in the same room (to the extent
possible) and stands up their implementation and other people
try to talk to it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
There&#39;s a lot of back-and-forth of the form of &amp;quot;Hey, I&#39;m getting error X when I talk
to your implementation, can you take a look&amp;quot;. The end result of
this is an interoperability matrix showing which implementations
can talk to each other and with which tests. For instance,
here&#39;s the interop matrix for one of the later QUIC drafts:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/quic-interop-matrix.png&quot; alt=&quot;QUIC Interop Matrix&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Each cell is a single client/server pair, with the client down the left
and the server across the right. The letters indicate which tests
worked, with the color indicating how well things are going,
the darker the better.&lt;/p&gt;
&lt;p&gt;This is all an enormous amount of work, and it&#39;s important to remember
that this is a best-case scenario in that the people writing
the specification are trying very hard to make it as clear as possible
and generally the implementors are trying to be helpful to each other.
The situation with messaging is quite different: the gatekeepers could
have provided interoperable interfaces at any time but chose not to.
Instead, they&#39;re just being required to provide them by the DMA,
so their incentive to make it work is comparatively low. Moreover,
they may not even have their own internal documentation; in my
experience it&#39;s quite common for engineering organizations to
just embody their interfaces in code with minimal documentation
which is insufficient to implement from.&lt;/p&gt;
&lt;h3 id=&quot;the-first-10%25-is-often-easy&quot;&gt;The first 10% is often easy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#the-first-10%25-is-often-easy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s often relatively easy to get things to sort of work in simple
configurations (what is sometimes called the &amp;quot;happy path&amp;quot;).
For instance, the first real public demonstration of WebRTC interop
&lt;a href=&quot;https://hacks.mozilla.org/2013/02/hello-chrome-its-firefox-calling/&quot;&gt;between Chrome and Firefox&lt;/a&gt; was in early 2012, at a point where
it just barely worked and needed handholding from Justin
and myself. Firefox didn&#39;t work with Google Meet until
&lt;a href=&quot;https://blog.mozilla.org/webrtc/firefox-is-now-supported-by-google-hangouts-and-meet/&quot;&gt;2018&lt;/a&gt;, which required changes on both sides.
A particular issue was around multiple streams of audio
and video (see &lt;a href=&quot;https://www.callstats.io/blog/what-is-unified-plan-and-how-will-it-affect-your-webrtc-development&quot;&gt;here&lt;/a&gt; for background
on the &amp;quot;plan wars of 2013&amp;quot;).&lt;/p&gt;
&lt;p&gt;In a similar vein, during the workshop Matthew Hodgson showed a demo of Matrix
interoperating with WhatsApp via a local gateway,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
which serves as a good demonstration that interop is possible, but but
as he mentioned himself, shouldn&#39;t lead anyone to conclude this is a
trivial problem. Sending messaging back in forth is probably the
easiest part of this problem, it&#39;s all of the details (group
messaging, media, ...) that will be the hard part to get right and are
also essential to it being ready for real users.&lt;/p&gt;
&lt;h3 id=&quot;it-gets-better&quot;&gt;It gets better &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#it-gets-better&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Note that the scenario I&#39;m talking about here is mostly what
happens with early protocol development and deployment. Once
there are widespread open source implementations that are
fairly conformant, you can just test against those implementations
and debug them directly when you have a problem;
of course, that&#39;s not likely to be the case for messaging
interoperability, at least initially, and especially if the
gatekeepers just publish their specs without any reference
implementation.&lt;/p&gt;
&lt;h2 id=&quot;the-surface-area-is-enormous&quot;&gt;The surface area is enormous &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#the-surface-area-is-enormous&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The number of different protocols that need to be implemented in order
to build a complete messaging system along with voice and video
calling is extremely large. At minimum you need something like the
following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Messaging
&lt;ul&gt;
&lt;li&gt;A protocol for end-to-end key establishment (e.g., MLS, OTR, Signal)&lt;/li&gt;
&lt;li&gt;The format for the messages themselves (e.g., MIME)&lt;/li&gt;
&lt;li&gt;A transport protocol for the messages (e.g., XMPP)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Voice and video. Everything above &lt;em&gt;plus&lt;/em&gt;
&lt;ul&gt;
&lt;li&gt;Media format negotiation (e.g., SDP)&lt;/li&gt;
&lt;li&gt;NAT traversal (e.g., ICE) if you want peer-to-peer media&lt;/li&gt;
&lt;li&gt;Media transport (e.g., RTP/SRTP)&lt;/li&gt;
&lt;li&gt;Voice and video codecs (e.g., Opus, AV1, H.264, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Every one of the things I&#39;ve named above is a very significant
piece of technology, often running to hundreds if not thousands
of pages of specifications.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;As a concrete example, let&#39;s look at WebRTC. At the time WebRTC was
being designed, there was an existing ecosystem of standards-based
voice and video over IP that used &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Session_Initiation_Protocol&amp;amp;oldid=1139230173&quot;&gt;Session Initiation Protocol
(SIP)&lt;/a&gt;
for signaling and RTP/SRTP for media. Those protocols were in wide use
but often not on an interoperable basis. Although many of the people
who designed WebRTC had also been involved in building that ecosystem,
there was also a feeling that many of those protocols were due for
revision, so there was a fair amount of updating/modification. By
the time the overarching protocol specification document
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8829.html&quot;&gt;JavaScript Session Establishment Protocol (JSEP)&lt;/a&gt;
was published in 2021, the set of relevant documents had grown to
10s of RFCs running to thousands of pages. Moreover, these
RFCs themselves depended on other previously published RFCs
defining stuff like audio and video codecs (for instance,
the &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc6716.html&quot;&gt;specification&lt;/a&gt;
for the mandatory to implement Opus audio codec is 326 pages
long, though at least part of that is a reference implementation).&lt;/p&gt;
&lt;p&gt;Of course, these are specifications for general purpose systems
and so you could almost certainly build a single system that
had less complexity. For instance, a lot of the complexity in
WebRTC is around media negotiation: suppose that one side
wants to send two streams of video and one stream of audio,
but the other side only wants to receive one stream of video.
An interoperable system needs to specify what happens in this
case, but if you have a closed system you can just arrange
that your software never gets itself into that state. There
are quite a few other cases like this where you can get away
with a lot less in a closed environment, but even then there
will still be quite a bit of complexity.&lt;/p&gt;
&lt;p&gt;At the same time, it&#39;s increasingly possible for small teams
to quickly build quite functional voice and video calling
systems. This apparent contradiction is explained by realizing
that there are widely available software libraries (for instance,
the somewhat confusingly named &lt;a href=&quot;https://webrtc.org/&quot;&gt;WebRTC&lt;/a&gt;
library) that implement most of these specifications and
provide an API that hides most of the details. The result is
that as long as you&#39;re willing to take whatever that library
implements, it&#39;s possible to build a functional system, but
you&#39;re pulling in the transitive closure of all the specifications
it depends on. The same thing is true for other protocols
such as TLS, XMPP, Matrix, etc.&lt;/p&gt;
&lt;p&gt;The key point to take home here is that actually having
interoperability between the gatekeepers and competitive
products requires nailing down an enormous number of details,
even if those details are hidden behind software libraries.
To the extent to which this uses the existing interfaces
and protocols, then this is a somewhat more straightforward
problem, but if a gatekeeper has built a largely proprietary
system from the ground up, then the effort of specifying it
in enough detail that someone else can build their own interoperable
implementation—not to mention the effort of building
that implementation—is likely to be very considerable.&lt;/p&gt;
&lt;h2 id=&quot;it-has-to-be-implemented-on-the-client&quot;&gt;It has to be implemented on the client &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#it-has-to-be-implemented-on-the-client&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you want to have end-to-end encryption of the communications,
then this means that much of the complexity has to be implemented
on the client. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You need to have end-to-end key establishment so the key establishment
protocol needs to run on the client.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Text messages are encrypted and so they need to be constructed
on the sender&#39;s client software and decrypted on the receiver&#39;s.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Similarly, audio and video need to be encoded and decoded on
the client.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is different from (for instance) e-mail, which is typically not
end-to-end encrypted, or non-E2E videoconferencing systems, in that
you can centrally transcode the media. For instance, if Alice can only
send audio with the Opus codec and Bob can only receive with G.711,
then the central system can transcode it, but if the data is
end-to-end encrypted, that&#39;s not possible (the whole
point of end-to-end encryption is that the central system can&#39;t
modify the content). Instead, you need to
ensure that every client has the necessary capabilities.
It &lt;em&gt;is&lt;/em&gt; possible to implement some of the system on the server.
For instance, because the transport is generally
not end-to-end encrypted (though it should be encrypted
between client and server), you might be able to gateway
transports between systems, such as if you want to connect
an XMPP client to a system that isn&#39;t natively XMPP.&lt;/p&gt;
&lt;h2 id=&quot;gatekeeper-libraries&quot;&gt;Gatekeeper Libraries &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#gatekeeper-libraries&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Technically, gatekeepers don&#39;t need to actually publish their
interfaces to achieve interoperability. Instead, they could
just build a software library
that implements their system. The idea here is that if I
build EKRMessage and I want it to talk to WhatsApp, I just
download their library and build it into EKRMessage. There
will be some set of functions that I need to call
(e.g., &lt;code&gt;sendMessageTo()&lt;/code&gt; to send a message) to implement
the interoperability.&lt;/p&gt;
&lt;p&gt;The obvious advantage of this design is that it hides&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
lot of complexity from the new implementor: the gatekeeper
doesn&#39;t need to document the details of any of their
protocols in the kind of excruciating detail I mentioned
above; they just build that into their software. Of course
now they have to document the library, but that&#39;s usually
a lot easier (after all, this is why people use libraries).
This kind of library &lt;em&gt;can&lt;/em&gt; be a huge force multiplier:
as I mentioned above, the existence of Google&#39;s WebRTC library
has made it much easier for people to build powerful A/V apps.
However, if every gatekeeper has their own library, this
is a lot less attractive, as Alissa Cooper from Cisco
pointed out in her workshop presentation. To just briefly
sketch a few of the problems with this approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Code bloat.&lt;/strong&gt; Each competitor application will need to build in a copy
of every gatekeeper&#39;s library, which is extremely inefficient.
As an example, the &lt;a href=&quot;http://webrtc.org/&quot;&gt;WebRTC.org&lt;/a&gt; &lt;a href=&quot;https://webrtc.googlesource.com/src&quot;&gt;library&lt;/a&gt;
is over 800K lines of code. Imagine that times 5.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Code architecture.&lt;/strong&gt; Each library is going to have its own
particular API style, which is going to make architecting
your app different. As a concrete example, consider what
happens if one library is asynchronous and event-driven and another
is synchronous and uses threads. Building an app that uses
both cleanly is going to be architecturally difficult
(typically you end up trying to force fit one control
flow discipline into the other).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Portability.&lt;/strong&gt; If the gatekeeper provides their library
in binary (compiled) form, then it will only be usable on
the specific platforms the gatekeeper builds it for. Even
if they provide source code, that often will not work on
one platform or another without work (portable code is hard!).
Of course, someone could port the library, but if they
aren&#39;t able to upstream them to the gatekeeper&#39;s source,
then the porting work needs to be repeated whenever the
gatekeeper makes changes. In addition to questions of
platform, we also have to think about questions of
language: if the library is written in Java and I want
to write my application in Rust, I&#39;m going to have
a bad day.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Dependency.&lt;/strong&gt; The competitor&#39;s application is dependent on the
gatekeeper&#39;s engineering team, with little ability
to fix defects—and &lt;em&gt;all&lt;/em&gt; software has defects—and
mostly has to wait for the gatekeeper to do it. This is true
even if the library is nominally open source, because it&#39;s
a huge amount of effort to find and fix problems in other people&#39;s
code. Additionally, whenever the gatekeeper changes their library,
you just need to update, even when it&#39;s a big change.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Security.&lt;/strong&gt; The competitor is taking on the union of all the vulnerabilities
in each library they bring in. This creates problems whenever a defect
is found because every competitor needs to upgrade (this is always
a problem with vulnerabilities in libraries, of course, which is
one reason why people try to minimize rather than maximize
their dependencies). Effectively, the security of the competitor&#39;s
app becomes that of the weakest of the gatekeepers they interoperate
with, which is obviously bad.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of these problems of course exist any time you take a dependency
on another project, which is why engineers are careful about doing
it. However, in most of those cases the provider of the dependency
wants you to use their code—you
may even be paying them—and so is motivated to help.
That&#39;s not really the case here.
The bottom line is that I think this is a pretty bad approach,
so I&#39;m not going to spend much more time on it in this post.&lt;/p&gt;
&lt;h2 id=&quot;common-versus-gatekeeper-specific-interfaces%2Fprotocols&quot;&gt;Common versus Gatekeeper-Specific Interfaces/Protocols &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#common-versus-gatekeeper-specific-interfaces%2Fprotocols&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The big question here is
whether gatekeepers will use (or be required to use) a common
set of protocols/interfaces or whether they would be able to just
dictate their own interfaces that anyone who wanted to interoperate
with them would have to use. Often this was phrased as whether
the gatekeepers would be required to implement &amp;quot;standards&amp;quot;, but
&amp;quot;standards&amp;quot; can mean anything from &amp;quot;this is what everyone does&amp;quot;
to &amp;quot;this is ratified by the International Telecommunications Union (ITU)&amp;quot;,
so I&#39;m not sure how helpful that term really is. The key question
is whether everyone is going to do more or less the same thing
or whether connecting to each gatekeeper will require doing something
new. As I said above, I think this has some obvious technical advantages.&lt;/p&gt;
&lt;h3 id=&quot;implementor-complexity&quot;&gt;Implementor Complexity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#implementor-complexity&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The biggest advantage of a common set of protocols/interfaces is that
it makes life much easier for access seekers/competitors. If you
need to build to entirely different interfaces for each gatekeeper,
it&#39;s going to be a lot of work to add a new gatekeeper, which obviously
doesn&#39;t promote interoperability on the broad scale, and is likely
to lead to lower service quality because you have to spread your
effort out across all the implementations.
Alissa Cooper had
a great slide showing what this looks like, where every competitor
has a little—or actually not so little—copy of every gatekeeper&#39;s
system in their app:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/cooper-bespoke.png&quot; alt=&quot;Bespoke versus consolidated solutions&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This model, where an app speaks a bunch of different protocols but
tries to present a unified user interface (what Matthew Hodgson was
calling a &amp;quot;polyglot app&amp;quot;) used to be reasonably common back in the
early days of instance messaging, where there were multiple open
(XMPP, IRC) or semi-open (AIM), messaging systems. This means we know
it&#39;s possible but we also know it&#39;s a lot of work. By contrast, with a
single set of protocols/interfaces each app only has to have a single
implementation that it can use everywhere and can focus on making it
really good.&lt;/p&gt;
&lt;h2 id=&quot;clearer-specifications&quot;&gt;Clearer Specifications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#clearer-specifications&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As described &lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#interoperability-is-hard&quot;&gt;above&lt;/a&gt;, one big barrier to
interoperability is lack of clear specifications. It&#39;s just
incredibly hard to write an unambiguous document, and it&#39;s even
harder when you&#39;re documenting some pre-existing piece of softwarethat never really had a written specification—as is most
likely the case for many of the systems in this space—it&#39;s
just way too easy to implicitly import assumptions about how
your system actually behaves without clearly documenting them.&lt;/p&gt;
&lt;p&gt;Standards development organizations like IETF and W3C have gotten
a lot better at this over the years and have developed a set
of practices that contribute to specification clarity. These include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Early implementation and interop testing.&lt;/li&gt;
&lt;li&gt;Automated test harnesses, both for interoperability and
for conformance (e.g., &lt;a href=&quot;https://web-platform-tests.org/&quot;&gt;WPT&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Widespread review using code collaboration tools (e.g., Github)
that make it easy for people to report small issues.&lt;/li&gt;
&lt;li&gt;Formal review and analysis for the most security critical pieces
(for instance, TLS 1.3 had at least 9 separate &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8446#appendix-E.1.6&quot;&gt;papers&lt;/a&gt;
published on its security before it was published).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These practices aren&#39;t a panacea, of course, and many IETF
and W3C specs are still impenetrable, but in my experience
the ones where the community really agreed they were important
(TLS, QUIC, etc.) are reasonably clear.
The main idea here is
about getting as many eyes on the problem and from
as many different perspectives as possible. This is only
possible in an environment of collaboration across many
organizations and it&#39;s hard to see how it will work when
gatekeepers just publish specifications and throw them over
the wall to competitors.&lt;/p&gt;
&lt;h2 id=&quot;developer-experience&quot;&gt;Developer Experience &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#developer-experience&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As discussed above, much of the experience of developing a
protocol implementation is trying to interpret the specifications
and figure out why your implementation is misbehaving. This
is obviously much easier if there are open source
implementations and a community of other implementors you
can work with. If instead we&#39;re going to have gatekeeper
published interfaces, then the gatekeepers will need
to provide quite a bit of support to developers who
want to talk to their systems. At minimum, this looks
something like:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Detailed specifications&lt;/dt&gt;
&lt;dd&gt;that really are complete
enough to implement, ideally including example protocol
traces and &amp;quot;test vectors&amp;quot; for the cryptographic pieces.&lt;/dd&gt;
&lt;dt&gt;Public test servers&lt;/dt&gt;
&lt;dd&gt;that developers can talk to.
These need to be separate from production servers because
testing in production is too dangerous. Moreover, they need
to have a much higher level of visibility (at minimum
very detailed logs) so that developers can see what is
going wrong.&lt;/dd&gt;
&lt;dt&gt;Live support&lt;/dt&gt;
&lt;dd&gt;from engineers who understand how things
are implemented and can help with debugging when log
inspection fails.&lt;/dd&gt;
&lt;dt&gt;Stable interfaces&lt;/dt&gt;
&lt;dd&gt;that remain live once published, so
that developers aren&#39;t constantly having to update their
code.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Without this level of support it&#39;s going to be extremely
difficult for competitors to make their code work in
any reasonable period of time, let along update them
as the gatekeeper makes changes.&lt;/p&gt;
&lt;h2 id=&quot;deployment-issues-with-multiple-gatekeepers&quot;&gt;Deployment Issues with Multiple Gatekeepers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#deployment-issues-with-multiple-gatekeepers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Most of the scenarios people seem to be considering involve
one or more competitors interoperating with a single gatekeeper,
e.g., Wire and Matrix talking to WhatsApp. This is a good first
step and it&#39;s of course not straightforward, but it&#39;s really
playing on easy mode, because the competitors have a real incentive
to do whatever it takes to interoperate with any gatekeeper.
What happens when someone on gatekeeper A wants to talk to
gatekeeper B? If everyone just publishes their own protocols,
then one of the gatekeepers has to implement the other
side&#39;s version. It&#39;s not clear to me that this is required
by the DMA. And if it is required, who will have to do it?&lt;/p&gt;
&lt;p&gt;This issue is particularly acute in group message contexts.
As a number of panelists mentioned,
group messaging is now the norm and 1-1 is just a special
case of a small group. Once you have large groups, you
have the possibility of a group which involves more than
one gatekeeper. Consider the case shown in the diagram
below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/MessagingGroups.png&quot; alt=&quot;Three way messaging&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this example, Alice and Charlie are on iMessage and WhatsApp
respectively. Bob is on EKRMessage and is able to individually
communicate with them because that client implements those interfaces. As noted
above, this is inconvenient, but will work.&lt;/p&gt;
&lt;p&gt;Now what happens when Bob wants to create a chat
with Alice and Charlie? He can send and receive messages to
each of them individually, but if neither WhatsApp nor
iMessage implements compatible interfaces, then when
Alice or Charlie sends a message, the other side can&#39;t
receive it. Importantly, unlike the simple 1-1 case
between gatekeepers, this looks to Bob like a defect in
his messaging system, not like noncooperation by the
gatekeepers. There&#39;s not much that Bob&#39;s client can do
about it: it could presumably decrypt the messages and
reencrypt them, but this destroys end-to-end identity,
which is undesirable.&lt;/p&gt;
&lt;p&gt;The point here is that in order to have interoperable messaging
work well in group contexts, basically everyone has to implement the
protocols of anyone who might be in the group.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
This will sort of work if there are a pile of protocols, but obviously it would be a lot easier
and cheaper if
instead everyone implemented something common.&lt;/p&gt;
&lt;p&gt;Having a common protocol is even more important in videoconferencing
situations: video tends to take up a lot of bandwidth and sending
an individual copy of the media to each receiver can easily
overrun consumer Internet links. Instead, large
conferences typically use what&#39;s called a &amp;quot;star&amp;quot; configuration
in which each endpoint sends one copy of the video to a central server (a &lt;em&gt;media conferencing unit (MCU)&lt;/em&gt;)
which then retransmits it to each receiver. But if a group
with N gatekeepers means that I need to send in N different
formats, then this will be dramatically less efficient.
However, this is even true to some extent for messaging: the new
IETF &lt;em&gt;Messaging Layer Security (MLS)&lt;/em&gt; protocol was designed to work well in large groups,
but won&#39;t work as well if you have to do pairwise associations.&lt;/p&gt;
&lt;h2 id=&quot;identity&quot;&gt;Identity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#identity&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Identity presents a special problem for reasons I&#39;ve &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/&quot;&gt;discussed&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/&quot;&gt;previously&lt;/a&gt;.
Those posts have more detail, but briefly each endpoint needs to be
able to discover and verify the identity of every other endpoint.
As with everything else, this can be done either with a common
protocol or with pairwise implementations of gatekeeper protocols.
However, the situation is more complicated here because many
messaging systems use overlapping namespaces.&lt;/p&gt;
&lt;p&gt;In particular, it&#39;s
quite common to use phone numbers (E.164 numbers) as identifiers,
as (for instance) both iMessage and WhatsApp do. This raises
a number of questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When someone from iMessage sends a message to someone from
WhatsApp, how does their identity appear?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;How do messaging apps know which service to use when given
a bare E.164 number?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What happens if someone has an account on two phone-number
using services.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are several possible approaches to addressing these issues
(as discussed in the above-linked posts) but we&#39;re going to need
to have some kind of answer, and if each system is left to solve
it itself, there is likely to be a lot of confusion.&lt;/p&gt;
&lt;p&gt;I did want to call out one particular risk: it&#39;s natural—at
least to some—to want
this to be as seamless as possible, for instance by using phone
number identifiers and automatically identifying the right service
to use, but this increases the attack surface area so that multiple
providers can assert a given identity. There are potential ways
to mitigate this (see &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/&quot;&gt;previously&lt;/a&gt;),
but they would actually need to be specified and deployed.
This is also an area where it would be advantageous to have
a single solution everyone agreed on, both because it&#39;s hard
to get this right, and because it would make it easier to
address questions of who owned which identity.&lt;/p&gt;
&lt;h2 id=&quot;timeline&quot;&gt;Timeline &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#timeline&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the big concerns that I&#39;ve seen raised about having a system
based on common protocols is that the DMA sets a very ambitious timeline
and that standards can take a long time to develop. There certainly
is some truth in this, but the good news is that many of the pieces
we need already exist (indeed, we often have several alternatives):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Function&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Protocols&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;End-to-end key establishment&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;MLS, OTR, Signal (and variants)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Identity&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;X.509, Verifiable Credentials, OIDC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Messaging format&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;MIME, Matrix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Message transport&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;XMPP, Matrix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Media format negotiation&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;SDP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;NAT Traversal&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;ICE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Media Transport&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;SRTP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Voice encoding&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;G.711,  Opus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Video encoding&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;VP8, AV1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Some of these pieces are in
better shape than others—I&#39;d really prefer not to use SDP if I
can avoid it!—and they don&#39;t all fit together cleanly, so it&#39;s
not just a simple matter of mixing and matching, but it&#39;s also not
like we&#39;re starting from scratch either. Moreover, the pieces that are
earliest in the timeline are also the ones that are the best
understood.&lt;/p&gt;
&lt;p&gt;My sense is that the best way to proceed is to have what might be
called a hybrid approach: use standardized components where they exist
and temporarily fill in the gaps with proprietary interfaces specified
by the gatekeepers while working to develop standardized versions
of those functions. Once those versions exist, then we can gradually
replace the proprietary pieces. The highest priority here should be getting
to common formats for the key establishment and everything inside
the encryption envelope (messages, voice, video), because those
are the pieces where incompatibility causes the biggest deployment
problems, as discussed above; fortunately, these are also some
of the most baked pieces and—at least in the case of voice and
video—where I expect there is a lot of commonality just because
there are only a few good codecs.&lt;/p&gt;
&lt;p&gt;I do think it&#39;s true that it&#39;s probably easier to get to &lt;em&gt;some&lt;/em&gt;
level of interoperability—especially at the demo level—by just having gatekeepers publish
interfaces, but it&#39;s a long way from &lt;em&gt;something&lt;/em&gt; to real reliable
interoperability (we learned this the hard way with WebRTC),
and there&#39;s going to be a long period of refining those interfaces
and the corresponding documentation. That&#39;s time that could be
spent building out common protocols instead, with a much better
final result.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dma-interop/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Having multiple non-interoperable siloes is clearly
far from ideal and it&#39;s exciting to see efforts like the DMA to do
something about that. We know it&#39;s possible to build interoperable
messaging systems and we&#39;ve got multiple worked examples going
back as far as the public switched telephone network and e-mail.
Even WebRTC is partially interoperable in the sense that multiple
browsers can communicate on the same service but not on different
services. To a great extent our current situation is due to a
particular set of incentives for gatekeepers not to interoperate;
the way to get out of that hole is to give them the incentives to
build something truly interoperable.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Some current bridging systems actually rely on the
user having a copy of the gatekeeper&#39;s app on the
local system and remote control that app. This
doesn&#39;t seem like a very good solution for reasons
which should be obvious. &lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For client/server protocols, people will often stand up
a cloud server endpoint. For extra credit, you can have
an endpoint which will publish connection logs so that
the other side can see your internal view on what happened. &lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I don&#39;t think that local gatewaying is a good technical
design because it requires terminating the encryption
from the gateway in the local server and then re-encrypting it
to the user&#39;s client, which destroys a lot of information,
such as end-to-end identity. This can be a useful prototyping
technique, but I don&#39;t think it&#39;s a great way to build
a production system. &lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though the IETF&#39;s practice of using 72-column
monospaced ASCII does make things longer. &lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In the &amp;quot;information hiding&amp;quot; sense of avoiding the
consumer having to think about it, rather than keeping it
secret, though of course they might &lt;em&gt;also&lt;/em&gt; want to
keep the details secret. &lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technically you can get away with less than a full mesh,
by having some kind of tiebreaker for each pair, but it&#39;s
going to be fairly close to a full mesh. &lt;a href=&quot;https://educatedguesswork.org/posts/dma-interop/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Network-based Web blocking techniques (and evading them)</title>
		<link href="https://educatedguesswork.org/posts/web-filtering/"/>
		<updated>2023-02-09T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web-filtering/</id>
		<content type="html">&lt;p&gt;Via &lt;a href=&quot;https://josephhall.org/&quot;&gt;Joseph Lorenzo Hall&lt;/a&gt;,
&lt;a href=&quot;https://twitter.com/echo_pbreyer/status/1622201719026221057&quot;&gt;Patrick Breyer&lt;/a&gt;,
and &lt;a href=&quot;https://edri.org/our-work/member-states-want-internet-service-providers-to-do-the-impossible-in-the-fight-against-child-sexual-abuse/&quot;&gt;EDRI&lt;/a&gt;, I see that the EU&#39;s
&lt;a href=&quot;https://data.consilium.europa.eu/doc/document/ST-12354-2022-INIT/en/pdf&quot;&gt;Internet Filtering requirements&lt;/a&gt;
(sometimes called &amp;quot;chat control&amp;quot;) are continuing to move forward.
The legal language is a bit hard to wade through, but it appears to require &lt;em&gt;Internet
Service Provider (ISPs)&lt;/em&gt; to block specific content on Web sites, identified
by &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=URL&amp;amp;oldid=1136437341&quot;&gt;URL&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Article 16 lays out the scope of blocking order:&lt;/p&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The competent authority shall also have the power to issue a blocking order requiring
a provider of internet access services under the jurisdiction of that Member State to
take reasonable measures to prevent users from accessing known child sexual abuse
material indicated by all uniform resource locators on the list of uniform resource locators
included in the database of indicators, in accordance with Article 44(2), point (b) and
provided by the EU Centre&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;p&gt;And then Article 18 lays out requirements for user notification
and redress:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Where a provider prevents users from accessing the uniform resource locators pursuant to
a blocking order issued in accordance with Article 17, it shall take reasonable measures to
inform the users of the following:&lt;/p&gt;
&lt;p&gt;(a) the fact that it does so pursuant to a blocking order;&lt;/p&gt;
&lt;p&gt;(b) the reasons for doing so, providing, upon request, a copy of the blocking order;&lt;/p&gt;
&lt;p&gt;(c) the users’ right of judicial redress referred to in paragraph 1, their rights to submit
complaints to the provider through the mechanism referred to in paragraph 3 and to
the Coordinating Authority in accordance with Article 34, as well as their right to
submit the requests referred to in paragraph 5&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Unfortunately, as EDRI observes, this kind of filtering is not really technically
practical in today&#39;s Web. In this post I talk about the technologies which are
used for Web filtering, as well as some of the privacy and security
technologies which make that sort of blocking harder.
This post is intended to be self-contained, but you might
find previous posts on &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/&quot;&gt;tracking and browser privacy features&lt;/a&gt; (tracking and blocking are closely related) and &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/&quot;&gt;IP concealment&lt;/a&gt; useful background.&lt;/p&gt;
&lt;section style=&quot;border: 1px solid; border-color: --accent-color; padding: 25px; padding-bottom: 10px;&quot;&gt;
&lt;h3 id=&quot;get-eg-in-your-mailbox&quot;&gt;Get EG in your mailbox &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#get-eg-in-your-mailbox&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you like what you&#39;re reading here, you can, as they
say &amp;quot;smash that subscribe button&amp;quot; to get the newsletter
version delivered right to your mailbox.&lt;/p&gt;
&lt;form class=&quot;inline-form&quot; action=&quot;https://educatedguesswork-subscribe.fly.dev/subscribe&quot; method=&quot;post&quot;&gt;
  &lt;input type=&quot;email&quot; placeholder=&quot;Your e-mail address...&quot; id=&quot;email&quot; name=&quot;email&quot; /&gt;
  &lt;button class=&quot;subscribe-button&quot;&gt;Subscribe&lt;/button&gt;
&lt;/form&gt;
&lt;p&gt;
&lt;/p&gt;&lt;/section&gt;
&lt;h2 id=&quot;threat-model&quot;&gt;Threat Model &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#threat-model&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In many security situations there&#39;s pretty broad consensus on who is
the attacker (e.g., the person trying to steal your credit card
number), and who is the defender (the person who doesn&#39;t want their
credit card stolen), and traditionally in the design of security
protocols we think of the network as the attacker and the job of the
protocol to be to defend you against the network. However, in this
situation, the entities trying to block certain content usually think
of themselves as the defenders, either because they are trying to block
content which is illegal (such as Child Sexual Abuse Material (CSAM))
or because they want to control the use of their own network (e.g., to
protect it against malware-infected machines or to stop their
employees from exfiltrating company secrets in what&#39;s called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Data_loss_prevention_software&amp;amp;oldid=1131197178&quot;&gt;Data
Leak Prevention
(DLP)&lt;/a&gt;),
and the endpoint trying to evade filtering as the attacker.&lt;/p&gt;
&lt;p&gt;Debates in this area tend to quickly devolve into questions about the
legitimacy of various kinds of blocking and how sympathetic
participants are to them. In my experience such debates don&#39;t usually
get very far and I don&#39;t propose to engage with them here;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
the purpose
of this post is just to lay out the technical situation of that
is and is not possible given the current and anticipated future
state of the Web.&lt;/p&gt;
&lt;p&gt;Note that it&#39;s not always the case that the interests of the
user and the interests of the blocker are opposed. For instance,
consider the case where the network wants to block access to
sites which host frauds or malware: the user presumably doesn&#39;t
want to download malware, and so would potentially benefit
from the network preventing access.
intended to protect the user from fraud and malware.
However, these technologies are value neutral:
the same mechanisms that might allow the network to block access
to CSAM or malware also allow it to block access to Facebook or
to Google search; the same goes for technologies for evading
blocking.&lt;/p&gt;
&lt;h2 id=&quot;endpoint-status&quot;&gt;Endpoint Status &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#endpoint-status&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The most common and familiar situation is when the endpoint
isn&#39;t really trying to evade blocking but also isn&#39;t actively
cooperating with it, as is the case with most consumer
devices. The software on the device usually implements
some set of default protections (e.g., HTTPS), as discussed
below, but they&#39;re ones that are suitable for full-time
use, rather than fancy ones that would be expensive,
inconvenient, or slow. They might also contain some filtering
mechanisms, though usually ones that the vendor has judged
users will want (e.g., &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy&quot;&gt;Safe Browsing&lt;/a&gt;) and in many cases these
can be disabled:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sb-disable.png&quot; alt=&quot;Disable Safe Browsing&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Another quite common case is one in which the device is
&lt;em&gt;managed&lt;/em&gt;, for instance, one used
by employees of a company but which actually belongs
to the company and where the company controls the
software on the device (e.g., via
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mobile_device_management&amp;amp;oldid=1133206233&quot;&gt;Mobile Device Management (MDM)&lt;/a&gt;. For obvious reasons, it&#39;s much easier for
the network to control the behavior of managed devices.
Most consumer devices are of course unmanaged; this
didn&#39;t always used to be true for mobile devices,
where it was common for carriers to install various
kinds of software before selling them, but Apple&#39;s
direct sales, their insistence on a standard
software load, and the subsequent changes in industry
practice mean that in many of not most cases
smartphones are not meaningfully under control of the
carriers. Many work devices are managed, but not all;
of particular concern to many enterprises is what&#39;s
called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Bring_your_own_device&amp;amp;oldid=1123679984&quot;&gt;bring your own device (BYOD)&lt;/a&gt;, in which people use their own devices for
work purposes; unsurprisingly, employees are often
unwilling to allow their employers to control the software
on these devices and so in many cases they will
be unmanaged.&lt;/p&gt;
&lt;p&gt;On the other side of the spectrum, we have endpoints
which are deliberately trying to avoid monitoring.
This could be something the user wants, for instance
because they are in a jurisdiction that restricts Internet
access and are using something like a VPN or Tor.
It could also be because there is malware on their
machines. In many cases, that malware will want to
talk to its &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Botnet&amp;amp;oldid=1126852606#Command_and_control&quot;&gt;command-and-control (CNC)&lt;/a&gt; servers.
However, this software
only needs to be able to talk some prearranged
set of servers and thus doesn&#39;t need to speak
standard protocols—though it might
impersonate them!—and might share secret information
with those servers. This makes evasion easier.&lt;/p&gt;
&lt;h2 id=&quot;blocking-techniques&quot;&gt;Blocking Techniques &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#blocking-techniques&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The difficult part of blocking traffic isn&#39;t really the blocking
itself but rather knowing what traffic to block. It&#39;s fairly straightforward
to just disconnect the Internet, but that makes the network useless.
What you want is &lt;em&gt;selective&lt;/em&gt; blocking in which you block only
the traffic of interest and allow the rest of the traffic to pass
through (conversely, many anti-blocking techniques are designed to
degrade the visibility necessary for selective blocking, thus
forcing the network into a position of blocking all traffic or
none of it). There are a number of ways to get the information
of what content the endpoint is trying to access.&lt;/p&gt;
&lt;h3 id=&quot;dns-based-blocking&quot;&gt;DNS-Based Blocking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#dns-based-blocking&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One very common place to do blocking is at the DNS layer (see &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/&quot;&gt;my
series on DNS&lt;/a&gt; for
background here). DNS-based blocking is very technically
straightforward because the client directly asks the DNS server for
the contact information (IP address) of the Web server it&#39;s trying to
contact, so it&#39;s easy to add a filtering step. Moreover, there are a
number of DNS providers (e.g., Umbrella/OpenDNS or Cloudflare) which
offer filtered DNS servers. Umbrella will even let you configure which
sites you want blocked.  The DNS server has a number of options if a
blocked domain is requested, including returning an error to
the client or returning a bogus IP address which can then be
blocked; in either case, the client will not be able to contact
the ultimate server.&lt;/p&gt;
&lt;p&gt;Network-imposed DNS-based filtering works because the network typically
provides the DNS server used by endpoints (notifying them about it via
DHCP). However, it&#39;s also possible for users to configure their
devices to use a different server or for endpoint software to do its
own resolution via a non-network resolvers.
For instance, it&#39;s quite common for people to configure their
devices to use Google Public DNS (8.8.8.8) or Cloudflare
DNS (1.1.1.1),&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
and Firefox is increasingly using
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/&quot;&gt;DNS over HTTPS&lt;/a&gt; in a mode
which bypasses the
local resolver in favor of a &amp;quot;trusted recursive resolver&amp;quot; that
has agreed to comply with Mozilla&#39;s &lt;a href=&quot;https://wiki.mozilla.org/Security/DOH-resolver-policy&quot;&gt;policy requirements&lt;/a&gt;
around user security and privacy. Obviously, malware can do the same.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Historically, if the user just pointed their device at a public
resolver, the network could still do DNS filtering by intercepting
the communication to the resolver. However, if DNS
traffic is encrypted to the server, that prevents this
kind of filtering. Ultimately, if networks
want to enforce DNS-based filtering in these circumstances,
they need to prevent connections to the public DNS resolvers,
which, given that they run DNS over HTTPS, brings us back
to the same problem of blocking Web traffic, at least
for unmanaged endpoints; for managed endpoints, it&#39;s
generally possible to just disable encrypted DNS;
in fact Firefox does this automatically if it thinks the
endpoint is managed.&lt;/p&gt;
&lt;p&gt;Even where DNS-based blocking is effective, it&#39;s a fairly limited
mechanism. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It can &lt;em&gt;only&lt;/em&gt; block on domain name and not URI.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
For instance,
if you want to block &lt;code&gt;https://example.com/contraband&lt;/code&gt; and not
&lt;code&gt;https://example.com/totally-cool&lt;/code&gt;, that&#39;s not possible
because the browser just asks for the address of
&lt;code&gt;example.com&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It usually can&#39;t provide any notification to the user of what happened;
the server can just make it look like the name doesn&#39;t
exist or the server isn&#39;t offline. It&#39;s of course possible
to provide the address of a server controlled by the network,
but if the client is trying to connect via HTTPS, then
this will result in a connection failure (more on this later),
not a comprehensible message to the user.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;On this second point, I&#39;ve seen proposals for allowing the server to send back a
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-wing-dnsop-structured-dns-error-page-05&quot;&gt;more detailed error message&lt;/a&gt;
telling the endpoint that a site was blocked, for instance.&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;c&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;tel:+358-555-1234567&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;sips:bob@bobphone.example.com&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string&quot;&gt;&quot;https://ticket.example.com?d=example.org&amp;amp;t=1650560748&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;j&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;malware present for 23 days&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;s&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;o&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;example.net Filtering Service&quot;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is in theory possible, but there are several obstacles
that prevent unilateral deployment by ISPs or
enterprise networks. First, no existing Web client supports this new message, so
at present they will just show a failure as described above.
Second, if the browser uses the operating system resolver
(as, for instance, Firefox does when it&#39;s not using
DoH; Chromium uses its own resolver), then it will only
be able to get this message once the operating system is
updated to support it, which is likely to take a very
long time. Finally, the browser would need to figure out
some way to present the information so that it&#39;s clear
what&#39;s happening and that it can&#39;t be used to fool the
user into accepting the error message as coming from the
valid site (&amp;quot;please enter your social security number here!&amp;quot;);
this problem is presumably soluble if there is enough
interest otherwise.&lt;/p&gt;
&lt;h3 id=&quot;ip-filtering&quot;&gt;IP Filtering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#ip-filtering&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you&#39;re not doing DNS-based filtering, your next opportunity to
filter is at the IP layer. IP-layer filtering is exactly what
you think it is: the network blocks connections to certain
IP addresses. There are a number of possible alternatives
here (drop the packets, send a TCP RST, BGP poisoning)
but they all amount to the same basic idea, which is to render
certain IP addresses inaccessible.
Unlike DNS-based filtering, it&#39;s not straightforward for
clients to just opt out of IP-based filtering: the network
has to be able to see the server&#39;s IP address to deliver
the packets, so if you want to bypass it, you need to get
a new network.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;ignoring-the-great-firewall&quot;&gt;Ignoring the Great Firewall &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#ignoring-the-great-firewall&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In general, once the network has identified your connection
for blocking, that&#39;s it but in at least one case, this
was easy to avoid. China famously uses a blocking system often called
the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Great_Firewall&amp;amp;oldid=1136231031&quot;&gt;Great Firewall&lt;/a&gt;,
which operated in part by sending TCP RSTs when it detected
things it didn&#39;t like. This is cheaper technically
than blocking all the packets.
Some time back, &lt;a href=&quot;https://www.cl.cam.ac.uk/~rnc1/ignoring.pdf&quot;&gt;Clayton, Murdoch, and Watson&lt;/a&gt;
discovered that clients could just ignore the TCP RSTs,
in which case the traffic would continue to flow.
I don&#39;t know if this is still true.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;On the other hand, IP-based filtering is even less precise
than DNS-based filtering. Obviously, it can&#39;t see the specific
resource you are connecting to, but it can&#39;t even always tell
which Website is being accessed: it&#39;s very common for multiple
Web sites to share the same IP address (for instance,
every Github Pages site seems to have the same IP,
as does every Substack that has an address ending
in &lt;code&gt;substack.com&lt;/code&gt;), and so you can&#39;t IP block one site without
blocking others. Even in situations where there isn&#39;t
IP sharing, but where many sites share the same hosting
provider, the hosting provider can readily change which
IP addresses correspond to which sites, making it
hard for the blocker to keep up. Thus, IP blocking is
good for blocking access to big sites which don&#39;t share
infrastructure, such as Google or Facebook, but not so
good for smaller sites.&lt;/p&gt;
&lt;p&gt;Like DNS-based filtering, IP-based filtering isn&#39;t able
to provide any feedback to the user about what went
wrong: it just looks like a network failure. Unlike
DNS-based filtering, I haven&#39;t even seen credible
proposals for how to add such a function and most
of the obvious avenues seem fairly unattractive to
browser makers.&lt;/p&gt;
&lt;h3 id=&quot;content-analysis&quot;&gt;Content Analysis &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#content-analysis&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The next major approach is to inspect the application layer
traffic (e.g., HTTP or TLS), and filter based on that.
This is a very powerful technique when applied to HTTP
because it allows you to see all of the data being exchanged,
including the URL being requested and all of the content
being returned, so you can do some fairly fancy filtering.
For instance, you could not only check the URI but scan
the returned content for malware or CSAM.&lt;/p&gt;
&lt;p&gt;However, this sort of filtering is increasingly impractical
because the vast majority of Web traffic is now encrypted,
as shown in the figure below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/lets-encrypt-HTTPS-stats.png&quot; alt=&quot;HTTPS pageload fraction&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: &lt;a href=&quot;https://letsencrypt.org/stats/#percent-pageloads&quot;&gt;Let&#39;s Encrypt&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;When the traffic is encrypted, the network can&#39;t see the content
of the HTTP connection, which means it can&#39;t see either the
URL or the response—this is the point of encryption!—so
the amount of filtering possible is quite limited.&lt;/p&gt;
&lt;p&gt;The main piece of information that the network can see is the
hostname of the Web server. This is carried in two places
in the TLS handshake:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;In the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Server_Name_Indication&amp;amp;oldid=1134138596&quot;&gt;Server Name Indication (SNI)&lt;/a&gt; field of the client&#39;s first message (the ClientHello).
(This is the field that allows you to have multiple servers on the same IP).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the server&#39;s Certificate message, although this may
not be unique, as a server may have a certificate that covers
multiple sites. For instance, there is a single &amp;quot;wildcard&amp;quot;
certificate for &lt;code&gt;*.github.io&lt;/code&gt; that works for any site
ending in &lt;code&gt;.github.io&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8446&quot;&gt;TLS 1.3&lt;/a&gt;, the
server&#39;s Certificate message is encrypted, which means that
the only information about the server&#39;s identity available
to the network is in the SNI in the ClientHello. You shouldn&#39;t
be surprised to hear that there is now work underway to encrypt
the ClientHello message to conceal the SNI, using a technology
called (surprise!) &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-tls-esni-15&quot;&gt;Encrypted Client Hello (ECH)&lt;/a&gt;.
ECH hasn&#39;t been widely deployed yet, but it&#39;s under active
development by browser vendors and some server operators,
such as Cloudflare. If ECH is in use, then the network
will not be able to use TLS to distinguish between any of the servers
on the same IP address, reducing the filtering granularity
to that of IP blocking.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;browsers-and-mitm-proxies&quot;&gt;Browsers and MITM Proxies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#browsers-and-mitm-proxies&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;MITM proxies are a difficult problem for browsers: generally
you want to allow users to add their own trust anchors to permit so-called
&amp;quot;Enterprise CAs&amp;quot; in which the enterprise has its own private names
that it issues certificates for but it doesn&#39;t want to have
publically accessible. This is still somewhat common, though
arguably less necessary in the era of free certificates.
However, it would be possible for browsers to detect and
prevent the use of these enterprise CAs for any site which
also had a public certificate, thus more or less preventing
MITM proxies from working. However, the consequence of this would
be to break the browser on any network which had such a proxy,
which is obviously not a desirable outcome. The result is
that we&#39;re in a not-great equilibrium that is hard to get
out of without causing a lot of breakage.&lt;/p&gt;
&lt;/div&gt;
&lt;h4 id=&quot;mitm%2Fintercepting-proxies&quot;&gt;MITM/Intercepting Proxies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#mitm%2Fintercepting-proxies&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Many enterprise networks use what&#39;s called a &amp;quot;man-in-the-middle&amp;quot; or
&amp;quot;intercepting&amp;quot; proxy. This is a network device which sits in between
the client and the server, impersonating the server to the client and
the client to the server.  It decrypts the traffic between client and
server, inspects it, and then re-encrypts it. &amp;quot;But wait&amp;quot; I can hear
you say. &amp;quot;Isn&#39;t the whole point of TLS to prevent this kind of
attack?!&amp;quot; Ordinarily yes, but the organizations who deploy these
proxies also install their own
&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#background%3A-https-and-the-webpki&quot;&gt;trust anchors&lt;/a&gt;
on the client, which allow the proxy to issue certificates which
are acceptable to the client.&lt;/p&gt;
&lt;p&gt;Obviously, this doesn&#39;t work in consumer settings where the
network doesn&#39;t control the client. Of course, one could imagine
a nation requiring users to adopt a new trust anchor,
enabling them to intercept any connection, but this obviously
has extraordinary risks in terms of surveillance. In the one
case where a country went as far as trying it (&lt;a href=&quot;https://www.bbc.com/news/technology-49421729&quot;&gt;Kazakhstan&lt;/a&gt;),
browsers responded by explicitly blocking the trust anchor, so you
couldn&#39;t install it.&lt;/p&gt;
&lt;p&gt;Even in an enterprise,
MITM proxies aren&#39;t really a great system: they&#39;re expensive to
operate and because they have access to the plaintext of the
connection, present a security and privacy risk to users of
the system. There is also &lt;a href=&quot;https://jhalderm.com/pub/papers/interception-ndss17.pdf&quot;&gt;evidence&lt;/a&gt;
that the implementation quality of these proxies is less good
than that of browsers, which creates additional risks.&lt;/p&gt;
&lt;p&gt;In order to address some of these issues, enterprises will
sometimes (often?) configure their proxy to &lt;em&gt;selectively&lt;/em&gt;
decrypt traffic. The idea here is that the proxy looks at
the SNI field and only decrypts traffic so some destinations
(e.g., Facebook but not your bank). These enterprises
(and the vendors who sell these devices) are worried about
ECH because it has the potential to make this sort of selective
decryption impossible. I don&#39;t believe that this is likely
to be a problem in practice, however: if you are able to
install your own trust anchor, you should also be able to
configure the browser to disable ECH. Moreover, ECH information
is delivered over DNS, so as long as you can control DNS
(or, in the case of Firefox, disable DoH, which happens
automatically when it detects a new trust anchor) you can
just suppress the use of ECH.&lt;/p&gt;
&lt;p&gt;I did want to flag one point here about this kind of selective
decryption, which is that it only works if you are dealing
with an endpoint which is standards compliant and sends the
correct SNI value. If you are dealing with malware, it can
put whatever it wants (e.g., &lt;code&gt;www.bankofamerica.com&lt;/code&gt; in the SNI) but then connect to
its own CNC server. Selective decryption based on SNI only
works with clients which aren&#39;t themselves malicious, like
Web browsers. This is true whether or not the client
is using ECH. Note that this form of evasion only works
because of prearrangement between the malware and the CNC
server, so it&#39;s not deployable as a general mechanism for
Web browsers.&lt;/p&gt;
&lt;h4 id=&quot;traffic-analysis&quot;&gt;Traffic Analysis &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#traffic-analysis&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In principle it&#39;s possible to
learn about the content of on encrypted
traffic by looking at packet size, timing, etc. For instance,
the traffic pattern associated with watching video (a lot of big packets
sent continuously to the client) looks very different from that
associated with using Webmail (small, relatively intermittent,
chunks back and forth).
This approach is
often called &amp;quot;traffic analysis&amp;quot;.
&lt;a href=&quot;https://datatracker.ietf.org/doc/draft-irtf-pearg-website-fingerprinting/&quot;&gt;Goldberg, Wang, and Wood&lt;/a&gt;
provide a good overview of the situation for website identification,
and Cisco actually &lt;a href=&quot;https://www.cisco.com/c/en/us/solutions/collateral/enterprise-networks/enterprise-network-security/nb-09-encrytd-traf-anlytcs-wp-cte-en.html&quot;&gt;sells technology&lt;/a&gt;
for doing this (academic paper by Anderson and McGrew &lt;a href=&quot;http://library.usc.edu.ph/ACM/SIGSAC%202017/aisec/p35.pdf&quot;&gt;here&lt;/a&gt;) that tries to identify malware.&lt;/p&gt;
&lt;p&gt;My understanding of the current state of traffic
analysis is somewhat powerful as an attack on privacy: you
certainly can learn more about people&#39;s browsing behavior
than people might want you to learn, and is useful
as part of an enterprise threat response system that attempts
to detect malicious behavior, but is less useful at distinguishing
precise behavior (e.g., which exact images did someone view
on a specific site). It&#39;s also comparatively expensive to
operate technically, doesn&#39;t scale that well,
and requires seeing more behavior over
a longer period than other techniques (e.g., SNI), which
can make a decision very early in the connection. Thus,
my sense is that it&#39;s less useful for making large scale
content-based decisions for things like CSAM detection
or DLP.&lt;/p&gt;
&lt;h3 id=&quot;client-side-agents&quot;&gt;Client-Side Agents &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#client-side-agents&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s also possible to install a piece of software (an &amp;quot;agent&amp;quot;) on the
endpoint itself that monitors that behavior of the device.  These
agents sit into a variety of different and somewhat overlapping
categories (anti-virus, DLP, &lt;em&gt;endpoint detection and response (EDR)&lt;/em&gt;,
etc.) but basically they all do the same kind of thing, which is to
say spy on other programs and report back or otherwise act on behavior
it thinks is suspicious.  These agents typically have elevated
privileges and so can in some cases observe the internal details of
other programs, for instance, by actually injecting their own code
(this is a persistent problem for browser vendors because it can
negatively impact the stability of the product). For instance,
this would allow the agent to see the plaintext associated with
encrypted traffic, including the URI, the content, etc.&lt;/p&gt;
&lt;p&gt;In some cases, this software is something that users install
themselves (e.g., antivirus), but in others it&#39;s something that
is required by their employers, schools, etc. In the latter case,
it may be deployed in parallel with network monitoring techniques
to provide multiple views of the same activity. For instance,
you might have a client-side agent but also do MITM interception
This approach provides defense in depth: if you have
such an agent on your work computer, the natural way to avoid
monitoring is to use an unmonitored personal device. Network-level
monitoring can help detect this, even if it can&#39;t see precisely
what&#39;s happening, though it&#39;s obviously far less
powerful in an age of ubiquitous fast mobile Internet:
people can just turn off the WiFi and bypass your monitoring.&lt;/p&gt;
&lt;p&gt;In general, if you have a third party monitoring agent installed
on your computer, it&#39;s safest to assume it can do anything at all
on that device (Microsoft&#39;s &lt;a href=&quot;https://web.archive.org/web/20160311224620/https://technet.microsoft.com/en-us/library/hh278941.aspx&quot;&gt;&amp;quot;Immutable Law of Security #1&amp;quot;&lt;/a&gt;). In particular,
if you have an agent on your computer that is operated by someone
else, the safest assumption is that they have complete control
of your computer. In some cases, these organizations will
have policies about (for instance), what data they look at,
but that doesn&#39;t mean that there is any technical enforcement
mechanism that prevents them from violating those policies.
The few times I&#39;ve actually looked at this I came to the conclusion
that there weren&#39;t any meaningful technical controls;
it&#39;s possible someone has built something safer in this
space, but given the current state of computer security
it&#39;s a very difficult problem.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Client-side agents are a popular technique in enterprise settings,
but actually requiring their installation on everyone&#39;s non-work
devices seems like it would be a major policy change, and I don&#39;t
think it&#39;s likely that the EU would require it (at least I hope
not!).&lt;/p&gt;
&lt;h2 id=&quot;vpns%2C-proxies%2C-etc.&quot;&gt;VPNs, Proxies, etc. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#vpns%2C-proxies%2C-etc.&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As should be clear from the above, in the absence of cooperation
from the endpoint, the network only has fairly limited abilities
to selectively block traffic. More or less all it can do is
to block specific sites, but not control what content people
access on those sites. As technologies like encrypted DNS and ECH become
more common, even that level of blocking will start to become
more difficult. It will still be possible to block large
sites which have their own IP space (e.g., Facebook or Google),
but it will be harder to block just one site hosted by a given
service, such as one Github pages account or a single site
hosted by a CDN.&lt;/p&gt;
&lt;p&gt;Encrypted DNS and ECH are designed to be &amp;quot;always on&amp;quot; technologies
which people can just use for their regular browsing; this means
that the protection they can offer is limited. However, it
is also possible to provide a higher level of protection at
greater cost by proxying traffic to another network which
is not subject to blocking/filtering. This is what technologies
like VPNs, Tor, and iCloud Private Relay do (see
&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/&quot;&gt;here&lt;/a&gt; for an overview of these
techniques). The only really feasible way to prevent people
from bypassing blocking using these mechanisms is to block
access to the proxy/relay/VPN service entirely, which you would
typically do by the same kind of mechanisms I&#39;ve been discussing
above. I&#39;ve also seen some research designs for making
that kind of blocking more difficult (e.g., &lt;a href=&quot;https://telex.cc/people.html&quot;&gt;Telex&lt;/a&gt;),
but that&#39;s out of scope for this post.&lt;/p&gt;
&lt;h2 id=&quot;what-is-technically-feasible&quot;&gt;What is technically feasible &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#what-is-technically-feasible&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With this technical background, we can now look at the EU proposal.
Assuming I am reading it correctly (and EDRI reads it the same
way), it seems to have two requirements that are technically
problematic.&lt;/p&gt;
&lt;p&gt;First, as noted above, it&#39;s not really possible to block
based on a list of specific Uniform Resource Locators,
but only on sites. It&#39;s not clear to me how useful this
really is: if there are specific sites which are just
acting as hosts for CSAM, then there are a number of potential
avenues for having them shut down directly, rather than
filtering at the customer level (this happens fairly
often with sites which engage in various kinds of
copyright and trademark abuse). The primary reason why
URL blocking is useful is that it allows you to selectively
block part of a site—though here too it&#39;s not quite
clear to me why the authorities can&#39;t have that content
taken down once they are aware of it—but as noted
above, that kind of selective blocking is simply not practical to do at the network
level once traffic is encrypted.&lt;/p&gt;
&lt;p&gt;For similar reasons, it&#39;s also not really possible to provide notice to users
as required in Article 18 because there&#39;s no channel for
the provider to do so. In most cases the client will
be trying to establish an encrypted channel to the
server. The network can instead reroute that connection
to its own servers, but those servers cannot properly
authenticate as the server, so all they can manage to do
is cause the browser to show the user an error, but can&#39;t
control the error. Depending on exactly what the provider
does, it might look like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/fx-bad-cert.png&quot; alt=&quot;Firefox Certificate Warning&quot; /&gt;&lt;/p&gt;
&lt;p&gt;or like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/couldnt-connect.png&quot; alt=&quot;Could not connect warning&quot; /&gt;&lt;/p&gt;
&lt;p&gt;But what it definitely will not have is some message from
the provider about why the site is being blocked; there&#39;s
simply no mechanism to communicate that. It&#39;s presumably possible
to invent something here, but it&#39;s not something that
the providers can do unilaterally.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-filtering/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This brings us to the broader point, which is that
the network providers are simply the wrong place to situate
this kind of blocking. A basic assumption of communications
security is that the network is under control of the attacker
and 30+ years of work has gone into protecting Internet
traffic from potentially hostile networks. This work
isn&#39;t done, but there&#39;s been a huge amount of progress and
at this point it&#39;s really not practical to do effective
fine-grained blocking of traffic without the cooperation or
coercion of one of the endpoints.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is also why I am using the term &amp;quot;blocking&amp;quot; instead
of the common term &amp;quot;censorship&amp;quot;, which while
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-irtf-pearg-censorship-09&quot;&gt;technically accurate&lt;/a&gt;
in my opinion, tends to just get us into
&lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/pearg/Pb-xB83lCN5a6fVmR_g3ZL_JaY0/&quot;&gt;debates&lt;/a&gt;
about the definition of &amp;quot;censorship&amp;quot; from those who think that
certain forms of blocking are good and that the term
&amp;quot;censorship&amp;quot; has negative connotations. &lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
However, data from Huston and Damas &lt;a href=&quot;https://www.icann.org/en/system/files/files/presentation-day1b-resolver-centrality-huston-25may21-en.pdf&quot;&gt;indicates&lt;/a&gt;
that most of the use of the big public resolvers is
due to ISPs pointing their users to them, rather than
users configuring it themselves. &lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The public debate about the use of DoH and DoT
has sort of conflated use by browsers with use
by malware. The problem with malware use of encrypted
DNS exists because there are public DNS servers which
offer encrypted service, independently of whether browsers use it.
To the extent to which browsers make the problem worse
it&#39;s because their use of those servers makes it
less attractive to just block them entirely. &lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I once explored a design where the DNS server would send
the client a list of blocklisted URIs on the requested
domain, but this of course requires the client to cooperate,
so it&#39;s more like Safe Browsing than like a unilateral blocking mechanism. &lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The basic problem here is that if you don&#39;t trust the
system you are monitoring to behave correctly, then
you need access to its internals to be sure that it&#39;s
not lying to you about its behavior. But that
access is inherently abusable. &lt;a href=&quot;https://educatedguesswork.org/posts/web-filtering/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Internet Transport Protocols, Part I: Reliable Transports</title>
		<link href="https://educatedguesswork.org/posts/transport-protocols-intro/"/>
		<updated>2023-01-18T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/transport-protocols-intro/</id>
		<content type="html">&lt;p&gt;Most people who use the Internet just have some vague idea that
it carries data from point A to point B (famously, through
a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Series_of_tubes&amp;amp;oldid=1132967108&quot;&gt;series of tubes&lt;/a&gt;).
Even people who regularly work on Internet systems tend
to work with it through many layers of abstraction,
without a clear understanding of the infrastructure components
that make it work.
This post is the first of a series about one such piece of infrastructure: the transport protocols
such as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transmission_Control_Protocol&amp;amp;oldid=1132201507&quot;&gt;TCP&lt;/a&gt;
that are used to transmit between nodes on the Internet.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-network-programming&quot;&gt;Background: Network Programming &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#background%3A-network-programming&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you&#39;ve done any programming of networked systems, you&#39;ve
probably written code that looks something like this:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;socket &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;example.com&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8080&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;socket&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Hello&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;       &lt;br /&gt;response &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;socket&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;response&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Even if you don&#39;t have much experience with networking, this
code should be fairly self explanatory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The first line forms a &amp;quot;connection&amp;quot; to the server named
&amp;quot;&lt;a href=&quot;http://example.com/&quot;&gt;example.com&lt;/a&gt;&amp;quot; (see my series on &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/&quot;&gt;DNS&lt;/a&gt;)
for how these names work. &lt;code&gt;8080&lt;/code&gt; is what&#39;s called &amp;quot;port number&amp;quot;
and we can ignore it for now. This function returns
an object called a &amp;quot;socket&amp;quot; which represents that connection.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
Conceptually, this is like dialing the phone and calling
&amp;quot;&lt;a href=&quot;http://example.com/&quot;&gt;example.com&lt;/a&gt;&amp;quot;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The next line writes the string &amp;quot;Hello&amp;quot; to the server. Note
that because we already are connected to the server, we can
just pass in the socket, rather than the address of
the server.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The next two lines reads the response from the socket and
then print it out. As before, we don&#39;t need to specify
the server&#39;s address because that&#39;s encapsulated in the
socket.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As another example, here&#39;s a simple server that works with
this client. This server just takes whatever the client
writes to it and sends it back in upper case. The main
difference here is that instead of using &lt;code&gt;connect()&lt;/code&gt;, the
server uses &lt;code&gt;accept()&lt;/code&gt; which tells the computer to wait
for a client to connect to it on port 8080.&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;socket &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;accept&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;8080&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;loop &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;socket&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;result&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;socket&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;toUpperCase&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;    &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we run this client/server pair, we would expect the server to print:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Hello
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And the client to print:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;HELLO
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, the client could write multiple messages, like so:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;socket &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;10.0.0.1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8080&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;socket&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;At midnight all the agents&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;socket&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;And the superhuman crew&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;socket&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Come out and round up everyone&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;socket&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Who knows more than they do&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, we would expect all of the messages to be delivered and
that they will be delivered &lt;em&gt;in order&lt;/em&gt;, so that the server prints:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;At midnight all the agents
And the superhuman crew
Come out and round up everyone
That knows more than they do
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Rather than&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;And the superhuman crew
That knows more than they do
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;And the superhuman crew
Come out and round up everyone
At midnight all the agents
That knows more than they do
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, just like a phone call.&lt;/p&gt;
&lt;p&gt;What we&#39;re seeing here is just the programming interface, though, which
is to say it&#39;s a set of abstractions that the operating system and
the programming language provide to you to write your programs.
They don&#39;t tell us anything about what&#39;s actually happening on
the network. That&#39;s the subject of this post.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-a-packet-switching-network&quot;&gt;Background: A Packet Switching Network &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#background%3A-a-packet-switching-network&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The Internet is what is known as a packet switching network. What
this means is that the basic unit of the Internet is a self-contained
object called an &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internet_Protocol&amp;amp;oldid=1115350518&quot;&gt;Internet Protocol (IP)&lt;/a&gt;
&lt;em&gt;packet&lt;/em&gt; or &lt;em&gt;datagram&lt;/em&gt;. An IP packet is like a letter in that it has a source address and a
destination address. This means that when you send an IP packet on
the network, the Internet can automatically route the packet to the
destination address by looking at the packet with no other state
about either computer. A simplified IP packet looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/IP-packet.png&quot; alt=&quot;IP Packet&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The main thing in the packet is the actual &lt;em&gt;data&lt;/em&gt; to be delivered
from the source to the destination, also called the &lt;em&gt;payload&lt;/em&gt;.
The payload is variable length with a maximum typically
around 1500 bytes. Using IP is very simple: your computer transmits an IP
packet and the Internet uses the destination
address to figure out where to route it. When someone
wants to transmit to you, they do the same thing. Importantly,
for reasons we&#39;ll see shortly, packet switching is unreliable: when you send a packet to
the other end it might or might not get there
(&amp;quot;packet loss&amp;quot;). Moreover, packets
don&#39;t always arrive in the order they were sent (&amp;quot;reordering&amp;quot;).&lt;/p&gt;
&lt;h2 id=&quot;circuit-switching&quot;&gt;Circuit Switching &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#circuit-switching&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The alternative to packet switching is what&#39;s called &amp;quot;circuit switching&amp;quot;.
In a circuit switched network, the basic unit of operation is a
connection between two endpoints called &amp;quot;circuit&amp;quot;. In a circuit switched
system, you set up the circuit and then just start sending and everything
goes to the entity on the other end of the circuit, like in a telephone
call (more on phones later).&lt;/p&gt;
&lt;p&gt;In the original telephone network, this was actually a literal electrical circuit:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
phone service came into your house on a pair of copper wires and when
Alice wanted to call Bob, the central office would connect Alice&#39;s wires
to Bob&#39;s wires (there&#39;s some more electronics here, but you can ignore
this). Originally this was done by having an actual person at a switchboard,
which is just a board with a bunch of jacks corresponding to each outgoing
circuit from the central office. When Alice wanted to call Bob,
she would ring up the operator and tell them who she wanted to call.
The operator would plug a patch cable from Alice&#39;s jack into Bob&#39;s jack,
like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/telephone-switchboard.jpg&quot; alt=&quot;A telephone switchboard&quot; /&gt;&lt;/p&gt;
&lt;p&gt;When the first automatic switches were invented (an &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Strowger_switch&amp;amp;oldid=1117312336&quot;&gt;amazing story&lt;/a&gt;),
they worked much the same way: you&#39;d pick up your phone and dial and the
equipment at the exchange would connect your wires to the wires
of the person you were trying to call. From then on, signals
just went from your microphone to their speaker and then their ears and vice versa.
Circuit switching is conceptually convenient but has a number
of inconvenient properties. In the simplest version, it doesn&#39;t
allow you to talk to more than one person at once (the second
caller gets a busy signal!) and even if you arrange to
connect more than one person as in a conference call, you
have no way of distinguishing who is who (ever had to
ask who was talking?). But of course in a modern computer network
your computer is constantly talking to multiple computers at
once (&amp;quot;multiplexing&amp;quot;). This works badly with circuit switching
but just fine with packet switching because each packet is
self-identifying.&lt;/p&gt;
&lt;h2 id=&quot;the-problem-with-packets&quot;&gt;The problem with packets &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#the-problem-with-packets&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Packet switching has a number of nice properties, but small
self-contained packets are very limiting for the obvious reason that
most things that people want to send are more than 1500 bytes, whether
they be videos, phone calls, or large files; even Web pages are almost
always more than 1500 bytes. Moreover, because packet switching is
unreliable and packets might get lost or delivered in the opposite order
from the order they were transmitted in, if you just break up
your file into a set of packets and send them over the network,
the other side may not receive exactly what you sent.&lt;/p&gt;
&lt;p&gt;In other words, what we really want is circuit switching,
but what we have is packet switching.
If you know any computer people, you&#39;ve probably guessed what
I&#39;m going to say next because it&#39;s the standard thing to do:
we&#39;re going to &lt;em&gt;emulate&lt;/em&gt; circuits on top of packet switching
to build what&#39;s often called a &lt;em&gt;reliable transport protocol&lt;/em&gt;.
What a reliable transport protocol does is provide a service
that looks like a circuit (usually called a &amp;quot;connection&amp;quot;)
but built on top of the unreliable substrate of packet switching.
Designing these protocols so they work well turns out to be very substantial
undertaking and we&#39;ve been basically evolving them for the past 40+
years, starting with &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transmission_Control_Protocol&amp;amp;oldid=1132201507&quot;&gt;TCP&lt;/a&gt;,
which has been used from the early days of the Internet,
is one such protocol and more recently
with a newer protocol called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=QUIC&amp;amp;oldid=1131908653&quot;&gt;QUIC&lt;/a&gt;,
which is built on similar but more modern lines.&lt;/p&gt;
&lt;h2 id=&quot;the-world&#39;s-simplest-reliable-transport&quot;&gt;The world&#39;s simplest reliable transport &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#the-world&#39;s-simplest-reliable-transport&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The obvious thing to do here would just be to break up whatever
data you want to send into a series of packets and send them
to the other side. However,
as should be clear from the above, this won&#39;t work
reliably, because the packets might be lost or reordered,
preventing the receiver from reconstructing the data.
Thus the minimal set of problems we need to solve is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Allowing the receiver to reconstruct the order that
the data was sent in, even if the network reorders it.&lt;/li&gt;
&lt;li&gt;Ensuring that data is eventually delivered from the sender
to the receiver.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We&#39;ll take these one at a time, reordering first.&lt;/p&gt;
&lt;h3 id=&quot;reordering&quot;&gt;Reordering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#reordering&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The reordering problem is fairly easy to solve: we just
add a field to each packet which contains its number.
The receiver just sorts the packets as they are received,
and delivers them to the application once it has all
previous packets. So, for instance,
if the receiver receives packets in order
&lt;code&gt;1 3 2 4&lt;/code&gt; then it will behave like so:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Packet&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Deliver 1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Store 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Deliver 2, Deliver 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;4&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Deliver 4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This is actually not how TCP works, however. Instead, it
numbers not packets but &lt;em&gt;bytes&lt;/em&gt;. Specifically, each
TCP segment (i.e., packet) comes with a sequence number
which indicates the first byte in the packet, and
a length, which indicates the last byte of the packet.
This allows the sender to re-frame data when it retransmits
it. For instance, suppose that a TCP connection is
carrying typed characters: you want it to send each
character as soon as it is typed, so each will be in
its own packet, but if you need to retransmit a set
of consecutive characters, it&#39;s more efficient to
put them in their own packet. This only works if the
packets contain an indication of which byte is which. For the purposes of this
post, however, we&#39;ll think of packets as being fixed size
and assume there&#39;s no reframing.&lt;/p&gt;
&lt;h3 id=&quot;packet-loss&quot;&gt;Packet Loss &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#packet-loss&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There are a number of reasons that packets can be lost in
transmission. For instance, some network element could malfunction and
drop them or damage them, or, as we&#39;ll see later, an element could
explicitly drop them because they exceed available capacity. In either
case, if a packet is dropped, the only thing for the sender to do is
retransmit it, but how does it know whether to do so? In other words,
how do we detect packet loss?
One potential approach would be for the malfunctioning element to send
some kind of signal indicating that it dropped or damaged the
packet. But of course that signal itself might be dropped or damaged
by the network. Additionally, the problem might be in a passive
element such as a piece of wire which isn&#39;t able to send its own
messages. Finally, if the problem is a malfunctioning element, then
it might malfunction in such a way that it doesn&#39;t correctly send a
message. In any case, no mechanism where the sender receives a message
informing it of a lost packet will work reliably.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;the-end-to-end-principle&quot;&gt;The end-to-end principle &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#the-end-to-end-principle&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;This is a case of what&#39;s called the &lt;a href=&quot;http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf&quot;&gt;end-to-end
principle&lt;/a&gt;.
The basic observation is that there are a lot of
places for things to wrong between point A and
point B, and so if you want to ensure that a piece
of data arrives at point B, then trusting
intermediate elements isn&#39;t enough; you need A and B
to work together. This doesn&#39;t mean that you can&#39;t
have reliability mechanisms between intermediate
elements, but merely that they&#39;re not sufficient
to guarantee delivery all the way to the other end.
Rather, they act as an optimization that allows you
to detect failures more quickly than they would have
been detected by an end-to-end mechanism.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Instead of having a signal that a packet was &lt;em&gt;dropped&lt;/em&gt;, we&#39;re
going to instead have a signal that the packet was &lt;em&gt;received&lt;/em&gt;,
called an &lt;em&gt;acknowledgments&lt;/em&gt; (often abbreviated ACK). When
the receiver receives a packet, it sends an acknowledgment
of receipt. This tells the sender that the packet got all
the way to the receiver. Of course, acknowledgments have a number of
obvious drawbacks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;They only tell you when a packet was received, not that
it was lost, so the only way you know a packet was lost
is by waiting until you expected to see an acknowledgment
and then not getting one.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The acknowledgment can get lost in transit, so the
packet might have been delivered, but this still looks
like packet loss.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The reason to use acknowledgments is that they are robust:
no matter what is going on in the middle of the network,
if the acknowledgment is received, you know the packet
got through. If you just keep sending until you get an
acknowledgment, eventually the packet should get through
(unless of course, the network is totally broken).&lt;/p&gt;
&lt;p&gt;The way this works is that after the sender sends a packet it waits
for a period of time (see &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#retransmit-timers&quot;&gt;below&lt;/a&gt; for how long)
for the corresponding acknowledgment. If the
timer expires before the sender receives the acknowledgment, then it
retransmits the packet, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/reliable-transport-rt.png&quot; alt=&quot;Timeout and retransmission&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this diagram, the sender sends the first packet, which
arrives successfully, and is acknowledged. However, the
second packet gets lost in transmission. Eventually, the
sender&#39;s timer expires and so it retransmits packet 2.
This time it gets through and so does the acknowledgment,
so everything is good.&lt;/p&gt;
&lt;p&gt;In the simplest version of this protocol, the sender sends
one packet at a time. Once that packet is acknowledged
(potentially after one or more retransmissions), then
the sender sends the next packet. This is what is called
a &lt;em&gt;stop-and-wait&lt;/em&gt; protocol, because the sender doesn&#39;t
do anything until it hears from the receiver. The basic
problem with this design is that it&#39;s slow. The reason
for this is round-trip latency: the diagram above shows
packets as being sent and received at the same time,
but in practice they take some time to get from point
A to point B: even on a very fast Internet connection,
it can take a few milliseconds for a packet to get delivered,
and if the server is around the world, latency can
be on the order of a 100 milliseconds. If the sender is
waiting for the receiver&#39;s acknowledgment, then it&#39;s
just idle during this period, as you can see in the diagram below,
where the sender has to wait for a full round trip
before it can send the next packet.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/reliable-transport-stop-and-wait.png&quot; alt=&quot;Stop and wait&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The obvious thing for the client to do is just to send
data as soon as it&#39;s available but this has two big
problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The sender may be able to transmit the data faster
than the receiver wants to consume it. Think about the
case of streaming video: the sender could send the
whole video to the receiver but this would be really
inefficient because the receiver would have to store
it all until it was ready to play, and the viewer
might decide to only watch part of it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Even in cases where the user wants to receive the whole
file, their device might not be able to process the
incoming data as fast as the sender can transmit it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The network might not be able to handle the data
at the rate the sender can send it. This happens
frequently in cases where the sending device is
attached to a very fast local network but the
end-to-end connection to the receiver is slower.
As we saw before, this eventually will overwhelm the
slowest network link in between the two endpoints.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We&#39;ll deal with the first problem in this post
and the second problem in the next post.&lt;/p&gt;
&lt;h3 id=&quot;flow-control&quot;&gt;Flow Control &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#flow-control&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In order to prevent the sender from over-running the
receiver, we need a &lt;em&gt;flow control&lt;/em&gt; mechanism.
The standard approach
is for the receiver to &lt;em&gt;advertise&lt;/em&gt; the total
amount of data it is willing to receive at once
(see &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/below&quot;&gt;buffering&lt;/a&gt;). The technical
term here is the &amp;quot;receive window&amp;quot;. The sender can send
as many packets as it wants as long as they fit within
the window, as shown in the diagram below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/reliable-transport-window.png&quot; alt=&quot;Sliding windows&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this diagram, the sender starts out by assuming the
receiver&#39;s window is 1, so it sends a single packet.
The receiver acknowledges this packet with the message
&lt;code&gt;ACK (1, window=4)&lt;/code&gt;, which means &amp;quot;I have received
all packets up to 1 and you can send up to packet 3&amp;quot;
(this is called a &amp;quot;cumulative ACK&amp;quot;). The sender
responds by sending packets 2 through 4, and then waits
for the receiver&#39;s ACK. However, in the time that
packet 3 is in flight, the receiver has received
packet 2 and so it sends an ACK acknowledging it and
advancing the window to packet 5. This isn&#39;t received
until after the sender has send packet 4, but it
is receives shortly thereafter, allowing the sender
to send packet 5.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This mechanism is usually called &amp;quot;sliding windows&amp;quot;,
with the idea being that the window of data the sender
can send is continuously sliding forwards as ACKs
are received.
In this example, the sender still has to wait briefly
before it can send packet 5, but if the window
had been slightly larger, then it might have been
able to send continuously, with the ACK advancing
the window being received before the sender was ready
to send its next packet. This is especially true
if the sender isn&#39;t sending as fast as its network
will support, for instance if it&#39;s sending data
that depends on user input.&lt;/p&gt;
&lt;h3 id=&quot;buffering&quot;&gt;Buffering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#buffering&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At this point, you may have noticed that there&#39;s a
lot of waiting here. For instance:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The sender can&#39;t transmit until it has room in the
window.&lt;/li&gt;
&lt;li&gt;Once the sender transmits, it has to wait until it receives an
acknowledgment, because it might have been lost
or damaged.&lt;/li&gt;
&lt;li&gt;If the receiver receives packets out of order
it has to wait to deliver the packets to the
application until it has
received the ones before them.&lt;/li&gt;
&lt;li&gt;If the sender transmits packets faster than the
receiving application, the receiving operating
system needs to store the packets until the application
is ready.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;During these waiting periods, it&#39;s necessary to store
(technical term: &lt;em&gt;buffer&lt;/em&gt;) a copy of the packet.
For instance, when the program asks the operating
system to write something, but there&#39;s no available
window, the operating system just buffers the packet
until window is available.
Moreover, during this
period the system may be trying to send more packets,
which it may or not be able to send immediately. For instance,
if an application tries to upload a file, it may send
10 or more packets at once, which the sending system
needs to slowly meter out as window becomes available.
The sender needs a significantly-sized
buffer to store these packets.
Similarly, when a packet
has been received out of order, it needs to be buffered
until the earlier packets are available.&lt;/p&gt;
&lt;p&gt;This isn&#39;t the only place that buffering happen:
not all links on the Internet are the same speed, so
it&#39;s common to have a situation in which network A
wants to send faster than network B can send. In
this case, the computer connecting those networks
(a &lt;em&gt;router&lt;/em&gt;) has to buffer the packets until space
becomes available (often this is called a &lt;em&gt;queue&lt;/em&gt;).
In addition, the user&#39;s devices need
&lt;em&gt;input buffers&lt;/em&gt; where they store packets that have
come in but the operating system or application has
not yet had time to handle.&lt;/p&gt;
&lt;p&gt;In general, all devices on the Internet
have some level of buffering to deal with mismatches
between the rate at which it receives packets
and the rate at which it can handle them, whether
that means processing them locally or forwarding them
to some other device. Buffering allows these
devices to deal with situations where the incoming
rate &lt;em&gt;temporarily&lt;/em&gt; exceeds the processing rate
(which happens all the time) but the longer it
goes on, the more packets have to be stored. Most
devices maintain a maximum buffer size—if nothing
else, limited by the total amount of memory on the
device, but typically far less than that—and
when that size is reached, then they have to drop
packets; either by discarding some of the packets
already buffered or by discarding the new packets
(or both).&lt;/p&gt;
&lt;h3 id=&quot;retransmit-timers&quot;&gt;Retransmit Timers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#retransmit-timers&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I&#39;ve been handwaving a bunch about how the sender sets a timer and
waits for the acknowledgment, but that doesn&#39;t tell us how long the
timer should be. In general, we want the timer to be based on
the &lt;em&gt;round-trip time (RTT)&lt;/em&gt; between the sender and receiver,
which is to say the time it takes a packet to go from sender
to receiver, the receiver to respond, and the respond to make
it back. If we set the timer shorter than the RTT, then the
ACK won&#39;t make it in time and the sender will retransmit even
if packets aren&#39;t lost; if we set it much longer, then we&#39;re waiting
too long to declare packets lost, which slows down the
connection. In practice, you want the retransmit timer
to be somewhat longer than the RTT because there&#39;s some variation
in network speeds, etc., but not too much longer. There&#39;s
a long literature on how to set the retransmit timer, which
I won&#39;t go into here.&lt;/p&gt;
&lt;p&gt;There&#39;s just one problem: we don&#39;t know the round trip time,
because it&#39;s not a property that the sender can see directly.
Instead it&#39;s a function of the speed of all the network links
in between the sender and receiver. Even if I have a fast network, I might
be connecting to someone with a slow network. It also depends
on how heavily
loaded they are at any given moment, because I&#39;m competing
for network capacity with other users, which means that it can
change over time.
Worse yet, RTTs can vary dramatically:
the RTT from my house in Palo Alto to the nearest Cloudflare
server is about 10ms. The best-case RTT from Australia to the US
is around 150ms (300,000 km/s is not just a good idea, it&#39;s the law).
If you pick a single value for your retransmit
timer, you&#39;re going to have seriously suboptimal performance
on many networks.&lt;/p&gt;
&lt;p&gt;The way that transport protocols handle this is to measure
the round trip time during the connection by looking at how long
it takes the other side to send an ACK. For instance, if you
sent packet 10 at T=2000ms and you get an ACK for it at T=2050ms,
then the estimated RTT is 50ms. Each time you get an ACK, you
update the RTT estimate. The typical approach is to maintain
a &lt;em&gt;smoothed&lt;/em&gt; estimate (effectively a weighted moving average)
of the recent measurements to average out the noise in
each individual measurement while also favoring more recent
measurements. Of course, you don&#39;t have any measurements at
the time you start transmitting, so the typical approach is
to use a somewhat conservative starting point (QUIC uses
333/ms), but obviously if the path between you and the
other side has a low RTT, you want to update that as soon as possible.&lt;/p&gt;
&lt;h2 id=&quot;set-up-and-tear-down&quot;&gt;Set-Up And Tear-down &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#set-up-and-tear-down&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So far I&#39;ve just covered the steady-state case where the sender
is already magically communicating with the receiver, but in
practice, but how do we get into this state, and how do we
stop?&lt;/p&gt;
&lt;h2 id=&quot;set-up&quot;&gt;Set-up &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#set-up&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In most transport protocols, there&#39;s some kind of initial setup
&lt;em&gt;handshake&lt;/em&gt; before data is transmitted. For instance, here&#39;s what
TCP&#39;s handshake looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tcp-3way.png&quot; alt=&quot;TCP 3-way handshake&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As I said above, TCP doesn&#39;t number packets, but instead labels
each byte with a sequence number. So, what&#39;s going on here is
that the client sends an empty SYN (for synchronize) packet
with sequence number 1234. The server acknowledges it with its
own SYN packet with sequence number 8765), and if it wants
can send data to the client at this point (though this
isn&#39;t the usual thing). Upon receiving the server&#39;s SYN,
the client can also send traffic, starting with sequence number
1235. In the same packet, it acknowledges the server&#39;s SYN.
Why is this necessary, though? Why not just start sending?
And why not just start the sequence number at 1 (or, as
C programmers would expect, at 0)?&lt;/p&gt;
&lt;p&gt;The problem here is that it&#39;s possible
for there to be two separate connections between client and
server. Suppose, for instance, that a client initiates
a connection and sends some data over it, and then ends
it and starts a new connection, as in the diagram below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tcp-seq-ambiguity.png&quot; alt=&quot;TCP sequence number ambiguity&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If a packet from connection 1 is delayed on the network
for a long period of time, it may be received by the
server after connection 2 starts and accepted as
part of that connection. Disaster!
The way TCP handles this is by having the new connection
start with a sequence number which is intended not to
overlap with valid sequence numbers from the previous connection.
Sequence number selection is actually
a somewhat complicated topic that I won&#39;t get into here,
The 3-way handshake is needed to ensure that
the client and the server agree on the initial sequence
numbers for the connection. Otherwise, you could have
a situation where the server was acting based on a delayed
SYN from a previous connection, leading to problems
(I&#39;ll spare you the details of the pathological cases).&lt;/p&gt;
&lt;p&gt;The problem with the 3-way handshake in
TCP&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
is that the client has to absorb a full round trip before it
can send anything, which is a real performance cost. This wasn&#39;t
really seen as a big deal when TCP was first designed, but
as Internet speeds have increased generally and latency has
become a big deal, it&#39;s become much more important.
It&#39;s possible to send data on the first packet as long as you can
guarantee that the server can distinguish this connection from
others. For instance, QUIC does this by having the client choose a
long (minimum 64 bits) random connection ID value, which distinguishes
this connection from all other connections (I&#39;m simplifying here, as
the QUIC connection ID logic is also quite complicated), thus
allowing the server to know that this is a new connection and
not a replay. There&#39;s still a setup handshake, but the client
and server are able to send during it, which saves round trips.
I plan to cover the complexities of getting this right later, but
wanted to mention it here for context.&lt;/p&gt;
&lt;h2 id=&quot;tear-down&quot;&gt;Tear-Down &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#tear-down&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;What happens when the endpoints are finished communicating?
In principle, they can just stop sending, but then what?
The problem here is that both sides have to keep state:
in order to be able to process Alice&#39;s packets, Bob needs
to remember the last packet she processed so that she knows
whether a received packet is a replay (to be discarded)
or new data (to be processed). This takes up memory and eventually
Bob is going to want to clean up. But how does Bob know
when Alice is really done and so it&#39;s safe to clean up
versus Alice just went quiet for a while?&lt;/p&gt;
&lt;p&gt;The obvious thing to do here is to have an &lt;em&gt;in-band&lt;/em&gt;
signal that says that the connection is closing. This
signal would itself be acknowledged, so it would
be delivered reliably. This is what TCP does,
but experience with newer protocols such as QUIC has
shown that this is not always the best approach. This is
another topic I plan to cover in a future post.&lt;/p&gt;
&lt;h2 id=&quot;common-themes&quot;&gt;Common Themes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#common-themes&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Aside from the technical details, you should be noticing
a few high level themes.&lt;/p&gt;
&lt;p&gt;First, the design of these protocols doesn&#39;t really depend on any
information about the internals of the network. It could be built out
of copper wire, optical fiber, microwave links, two tin cans and a
string, or all of the above. Similarly, you don&#39;t need to know how
fast any individual link is, how big the buffers are in the routers
along the path, etc. From the perspective of the transport protocol,
the Internet is just this opaque system where you put packets in one
side and they come out the other end. So, when we measure the RTT, for
instance, we&#39;re just measuring the aggregate RTT of the system as a
whole.&lt;/p&gt;
&lt;p&gt;Second, none of this needs any cooperation from the elements in the
middle. This means that (1) it&#39;s robust against any technical changes
in the network and (2) you can make changes to the transport protocol
without first having to change those elements.  These are key
properties for deployability: the Internet takes a really long time to
evolve, and if we needed to change every element between point A and
point B before we could use a new transport protocol on that path,
we&#39;d be waiting a very long time.&lt;/p&gt;
&lt;p&gt;Finally, we constantly have to think about what happens if the
network misbehaves in some way, for instance by dropping our
packets or delivering them way out of order. A properly designed
transport protocol has to be robust to all reasonable kinds
of network misbehavior—the bar for what this means has
gone up over the years to include active attack—and operate
properly, or at least fail safely. This is just the price of
trying to build a reliable system out
of unreliable components.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-congestion-management&quot;&gt;Next Up: Congestion Management &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#next-up%3A-congestion-management&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;What we have so far is basically a simplified version of
what TCP was like in 1986, when the Internet link
between Lawrence Berkeley Labs (LBL) and UC Berkeley
(about 400 yards apart) abruptly suffered what&#39;s come
to be known as &amp;quot;congestion collapse&amp;quot;, in which flaws
in the TCP retransmission algorithms caused the
effective throughput of the link to drop by a factor of
about 1000. In the next post, I&#39;ll be talking about
congestion collapse and how to avoid it.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The term sockets goes back to the original
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Berkeley_sockets&amp;amp;oldid=1120857869&quot;&gt;BSD sockets&lt;/a&gt;
programming interface which was commonly used on early
Internet systems and is now nearly universal. &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Ironically, in the modern phone network, it&#39;s fairly likely that
we&#39;re carrying the data over some packet-based transport,
very often IP. &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, I know that in practice it&#39;s common to
actually download smaller chunks of the video. &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that it&#39;s usual not to acknowledge ACKs,
otherwise you get into a situation where the
sides are just ping-ponging ACKs at each other.
 &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;TCP does have a new mode called &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7413&quot;&gt;TCP Fast Open&lt;/a&gt;
which allows sending immediately, but this is comparatively
modern and there are a number of deployment challenges. &lt;a href=&quot;https://educatedguesswork.org/posts/transport-protocols-intro/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Surprise, blockchains won&#39;t fix Internet voting</title>
		<link href="https://educatedguesswork.org/posts/voting-blockchain/"/>
		<updated>2023-01-09T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/voting-blockchain/</id>
		<content type="html">&lt;p&gt;You&#39;ll notice that in my &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto&quot;&gt;post&lt;/a&gt; on end-to-end
voting I never mentioned the word &amp;quot;blockchain&amp;quot;. However, there&#39;s been quite
a bit of interest in the &amp;quot;crypto&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
community around somehow using
the blockchain to &amp;quot;fix&amp;quot; voting. For instance, here&#39;s Binance CEO
Changpeng Zhao arguing back in 2020 that it will lead to more secure
elections with faster results:&lt;/p&gt;
&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;If there is a blockchain based mobile voting App (with proper KYC of course), we won&amp;#39;t have to wait for results, or have any questions on its validity. Privacy can be protected using a number of encryption mechanisms.&lt;/p&gt;&amp;mdash; CZ 🔶 Binance (@cz_binance) &lt;a href=&quot;https://twitter.com/cz_binance/status/1324170287009554432?ref_src=twsrc%5Etfw&quot;&gt;November 5, 2020&lt;/a&gt;&lt;/blockquote&gt; &lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt; 
&lt;p&gt;And here&#39;s Ethereum founder Vitalik Buterin endorsing the idea:&lt;/p&gt;
&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;The technical challenges with making a secure cryptographic voting system are significant (and often underestimated), but IMO this is directionally 100% correct. &lt;a href=&quot;https://t.co/J0qHiN2bbk&quot;&gt;https://t.co/J0qHiN2bbk&lt;/a&gt;&lt;/p&gt;&amp;mdash; vitalik.eth (@VitalikButerin) &lt;a href=&quot;https://twitter.com/VitalikButerin/status/1324179944558059522?ref_src=twsrc%5Etfw&quot;&gt;November 5, 2020&lt;/a&gt;&lt;/blockquote&gt; &lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt; 
&lt;p&gt;See also Buterin&#39;s more extensive defense of this position
&lt;a href=&quot;https://vitalik.ca/general/2021/05/25/voting2.html&quot;&gt;here&lt;/a&gt;, which
argues for the blockchain-as-bulletin board design. I address
some but not all of his points below.&lt;/p&gt;
&lt;p&gt;Spoiler alert: I think this is wrong, in two separate ways.&lt;/p&gt;
&lt;p&gt;First, blockchains are not really a useful element in Internet
voting: they don&#39;t solve the basic security problems in
the system, and are worse than the existing technologies
they would replace.&lt;/p&gt;
&lt;p&gt;Second, the basic premise that we need Internet voting in
order to fix our existing voting systems is largely misguided:
it&#39;s true that we see a lot of problems with those systems
in practice, but it&#39;s also quite possible to use
paper-based systems to run an election that produces quick
results which can be independently verified. To a great
extent, the operational problems that have gotten so much
press are the result of conscious decisions made by
policymakers. Moreover, at our current level of technology
Internet voting has serious vulnerabilities that we
just have no real idea how to overcome.&lt;/p&gt;
&lt;h2 id=&quot;blockchains-are-not-the-solution-to-internet-voting&quot;&gt;Blockchains are not the solution to Internet voting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#blockchains-are-not-the-solution-to-internet-voting&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Let&#39;s dispose of the obvious point first: the big problems in
the security of Internet voting stem from the need to secure
software (and keying material) on voters&#39; devices. A blockchain
doesn&#39;t really do anything to address this. Moreover, the fact
that we fairly routinely see &lt;a href=&quot;https://web3isgoinggreat.com/?id=raydium-exploit&quot;&gt;successful&lt;/a&gt;
&lt;a href=&quot;https://web3isgoinggreat.com/?id=raydium-exploit&quot;&gt;attacks&lt;/a&gt;
on &lt;a href=&quot;https://web3isgoinggreat.com/?id=oracle-attack-on-helio-enabled-by-a-separate-hack-on-ankr-allows-attackers-to-steal-15-million&quot;&gt;crypto infrastructure&lt;/a&gt;
as well as theft of crypto currency, including from
&lt;a href=&quot;https://web3isgoinggreat.com/?id=early-crypto-investor-loses-42-million-in-wallet-compromise&quot;&gt;crypto investors&lt;/a&gt; (and maybe even
&lt;a href=&quot;https://twitter.com/LukeDashjr/status/1609613748364509184&quot;&gt;core Bitcoin developers&lt;/a&gt;???)—who you would expect to be sophisticated—does
not exactly suggest that the cryptocurrency community has
discovered the secrets to key management and
to building secure cryptographic software.
And of course, even if they had, that software has to run on
commodity platforms which of course have their own security
problems; if end-user devices are compromised, then you can&#39;t
trust the cryptographic voting software on top of them even if
that software is perfect.&lt;/p&gt;
&lt;p&gt;The difficulty of getting ordinary people to use cryptography
correctly isn&#39;t some surprising piece of news. There&#39;s decades
of papers on how hard cryptographic software is to use
(see &lt;a href=&quot;https://people.eecs.berkeley.edu/~tygar/papers/Why_Johnny_Cant_Encrypt/OReilly.pdf&quot;&gt;here&lt;/a&gt; and then &lt;a href=&quot;https://www.researchgate.net/profile/Kent-Seamons/publication/283334711_Why_Johnny_Still_Still_Can&#39;t_Encrypt_Evaluating_the_Usability_of_a_Modern_PGP_Client/links/59512b3ea6fdcc218d24bac9/Why-Johnny-Still-Still-Cant-Encrypt-Evaluating-the-Usability-of-a-Modern-PGP-Client.pdf&quot;&gt;here&lt;/a&gt;).
In fact, here&#39;s
Zhao just last month &lt;a href=&quot;https://cointelegraph.com/news/only-1-of-people-can-handle-crypto-self-custody-right-now-binance-ceo&quot;&gt;saying&lt;/a&gt; that that 99% of people can&#39;t adequately
handle manage their own keying material for their
crypto:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For most people, for 99% of people today, asking them to hold crypto on their own, they will end up losing it.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;and:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Most people are not able to back up their security keys; they will lose the device [...] They will not have the proper encryption for their backup; they will write it on a piece of paper, someone else will see it, and they will steal those funds,” he explained.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But this is precisely what we are asking people to do in order
to do any kind of Internet voting (with or without a blockchain).
The security of these systems depends critically on the security
of the keying material used to authenticate each user. If
people can&#39;t safely do that for the keys to manage their money,
then why should we expect them to do so for a key they only
have to use twice a year?&lt;/p&gt;
&lt;h2 id=&quot;they-aren&#39;t-even-a-useful-element&quot;&gt;They aren&#39;t even a useful element &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#they-aren&#39;t-even-a-useful-element&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;OK, so blockchains don&#39;t solve the basic security problem with Internet
voting, but maybe they are a useful component? Again, I think the answer
is &amp;quot;no&amp;quot;.
The obvious place you
might want to use a blockchain is as the &amp;quot;bulletin board&amp;quot; for an
E2E system. The bulletin board needs to be (1) publicly accessible and
(2) have public consensus on the contents. Given that the point of
a blockchain is to provide consensus about which coins have been
spent, this seems like a natural fit.
The idea here would be that you would submit your ballot as a
record on the blockchain (just as you would a record of a spending
transaction). Any records which had been included as of the
date of the election (or some other deadline, presumably) would
then be treated as &amp;quot;on the bulletin board&amp;quot; for the purposes of
the rest of the protocol. You&#39;d of course need all the rest
of the apparatus of end-to-end verifiable voting like
the provable mix, etc., but maybe the blockchain would be
useful as the bulletin board.&lt;/p&gt;
&lt;p&gt;While possible in theory, this doesn&#39;t really get you much
in practice. First, the verifiability properties of a blockchain
do not map well onto what you need for an election. Second,
this use of a blockchain in this context has a number
of practical problems, as discussed in a quite thorough
&lt;a href=&quot;https://people.csail.mit.edu/rivest/pubs/PSNR20.pdf&quot;&gt;report&lt;/a&gt;
by MIT researchers Park, Specter, Narula, and
pioneering cryptographer (and
co-inventor of the RSA public key algorithm) Ron Rivest.&lt;/p&gt;
&lt;h3 id=&quot;verification&quot;&gt;Verification &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#verification&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The distinguishing feature of blockchain type systems is that
they are designed to be &amp;quot;zero-trust&amp;quot;, in the sense that you don&#39;t need
to trust a central authority to maintain the integrity of the
log. The specific property that the blockchain is guaranteeing
that everyone has consensus on:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Which transactions are in the log&lt;/li&gt;
&lt;li&gt;What order they occurred in&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The details of how it accomplishes this are out of scope for this
post (I&#39;ve been working on a post about this, but I&#39;m not happy with it yet),
but the key insight to have is that the reason you &lt;em&gt;need&lt;/em&gt; this
kind of system is that the transactions in the log do not themselves
provide all the information you need to verify them. Specifically,
while they are typically digitally signed and so you can verify they
are authentic, but you need the blockchain to tell you what order
they occurred and to ensure that people don&#39;t conceal transactions.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;E2E voting is similar in that you don&#39;t trust the voting authority
but different in that all of the information it publishes
is self-authenticating, so you don&#39;t need some separate mechanism to
ensure it was correctly recorded. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You can verify that all the input votes are valid
by checking their signatures (this is true of cryptocurrency systems
too).&lt;/li&gt;
&lt;li&gt;You can verify that the mixing was conducted correctly by checking
the proofs of shuffling.&lt;/li&gt;
&lt;li&gt;You can verify that the votes were decrypted correctly by checking
their proofs.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The only thing you can&#39;t directly verify from this information
is that votes weren&#39;t incorrectly excluded from the original
input set, but a blockchain doesn&#39;t really assist you here,
because it&#39;s just a record of what people claimed happened.
Instead, what you need is for the authority to publish the
input set in some way that everyone can see and that allows
people to &lt;em&gt;challenge&lt;/em&gt; the input set.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Specifically, the authority
publishes the set of signed encrypted ballots to the bulletin board and
then:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Voters who believe that their votes were improperly excluded
can challenge that exclusion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Observers who believe that a vote was improperly included
(e.g., the signature is invalid, or the voter is ineligible)
can challenge that vote.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This &lt;em&gt;does&lt;/em&gt; require that everyone agree on the contents of
each bulletin board, but you don&#39;t need the blockchain to provide
it because the election officials can just post it on their
Web site. Well, mostly.&lt;/p&gt;
&lt;h4 id=&quot;partitioning-attacks&quot;&gt;Partitioning Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#partitioning-attacks&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The reason for the &amp;quot;mostly&amp;quot; is that you can&#39;t check whether all the
votes that are supposed to be present actually are, because you don&#39;t
know who voted. Rather, you are counting on other people having
checked that their votes appear on the bulletin board (or people
checking for them). If that bulletin board is just a Web site then
it&#39;s theoretically possible to mount what&#39;s called a partition
attack.&lt;/p&gt;
&lt;p&gt;Suppose the election officials want to suppress Alice&#39;s vote.
If they just exclude it from the bulletin board, then Alice might
catch them. Instead, they &lt;em&gt;selectively&lt;/em&gt; exclude it, by creating
two copies of the bulletin board:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The main one they use for the actual count that excludes Alice.&lt;/li&gt;
&lt;li&gt;A bogus bulletin board that includes Alice.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When Alice goes to check her vote, the election officials send
Alice the bogus version, and so her checks succeed. However,
when anyone else checks the bulletin board, they send the real
copy.&lt;/p&gt;
&lt;p&gt;This is actually a very hard attack to mount in practice because
any number of things can go wrong. First,
if Alice checks the final totals, she&#39;ll see that they don&#39;t
match. Even if she&#39;s lazy, this depends on being able to perfectly
detect when Alice is checking as opposed to someone else;
as there is no reason to authenticate this transaction, that&#39;s
difficult. You could use the IP address, but what if Alice
votes from her phone and checks from her laptop?&lt;/p&gt;
&lt;p&gt;Moreover, this attack is easy to defeat as long as you have
any consensus mechanism at all. You certainly don&#39;t need
anything as fancy as a blockchain, though because we already have numerous mechanisms for election
officials to communicate authoritatively with the public in
ways that ensure that everyone gets the same information (e.g.,
by having that information broadcast on television or published
in the newspaper). All they need to do is publish the hash of
the bulletin board via one of these mechanisms and then everyone
can verify that they have the same bulletin board contents.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The point is that this is not a situation which needs &lt;em&gt;distributed&lt;/em&gt;
consensus; it just needs regular consensus.
The whole system has to be centrally operated
anyway, and that central authority is a natural mechanism for
establishing consensus.&lt;/p&gt;
&lt;h3 id=&quot;practical-problems&quot;&gt;Practical Problems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#practical-problems&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The details of how blockchains work are outside of the
scope of this post, but briefly, a blockchain is a public
list of transactions, with every transaction appearing—or
at least attested to—by the blockchain. It is
maintained by a set of servers who are responsible for checking the
validity of transactions and appending them to the
public log. In what&#39;s called a &amp;quot;permissionless&amp;quot; blockchain,
these servers are just operated by ordinary people
(or at least in theory, in practice of course it takes
a lot of resources to be relevant) and there
aren&#39;t any special trust relationships with those servers.
At a very high level the process looks something like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The user (voter) generates a candidate record that it wants
incorporated into the blockchain.&lt;/li&gt;
&lt;li&gt;The user&#39;s software then sends the record to some set of
other network nodes.&lt;/li&gt;
&lt;li&gt;Those nodes propagate that record to other nodes until
all—or at least most—of the other nodes in
the network have a copy.&lt;/li&gt;
&lt;li&gt;One or more network elements select a set of outstanding
records and incorporate them into the blockchain. Note
that I&#39;ve totally omitted how this happens. For our
purposes, it&#39;s magic.&lt;/li&gt;
&lt;li&gt;The extended blockchain is propagated to the rest of the
network.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The result is that everyone knows by looking at the blockchain
which records are in the consensus and which are not
(this part is magic too).&lt;/p&gt;
&lt;p&gt;As Park et al. observe, there are a number of things which
can go wrong here. For instance:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The nodes that the user submits their record to could
decide not to propagate it to other nodes, thus preventing
a given user from voting.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The nodes responsible for selecting the set of outstanding
records could omit a specific record, either unintentionally
(because it gets lost) or maliciously (to suppress a given
user&#39;s vote).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An attacker could attempt to mount a denial-of-service attack on
the network to prevent it from coming to consensus.
Park et al. suggest a specific attack scenario which exploits
the fact that in some networks the user has to &lt;em&gt;pay&lt;/em&gt; to
have their transactions included in the blockchain, and
the nodes have discretion about which transactions to
include (and can favor the higher bidding ones) at times when
the incoming transaction rate exceeds the throughput of the network.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
If the network is shared with other applications like financial
transactions, an attacker could potentially flood the system with transactions
in an attempt to starve out legitimate votes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An attacker might be able to exploit defects in system elements
or the associated protocols to globally or selectively mount
denial-of-service attacks on an election.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The bigger picture here is that blockchains don&#39;t provide a guaranteed
level of service and that the actual delivered level of service
depends on network elements which are untrustworthy and
&lt;em&gt;potentially malicious.&lt;/em&gt; This opens up a lot of opportunities for
attackers to interfere with election outcomes even if they aren&#39;t able to
actually forge votes. They don&#39;t need to be completely successful, either,
they just need to have a big enough impact to swing a close
election. Of course, some of these attacks are possible
with centrally operated systems, but at least in those systems you
know who to blame for outages (and remember, I&#39;m not saying that
Internet voting is good, even with centralized systems!).&lt;/p&gt;
&lt;p&gt;I could go on here, but if you&#39;re really interested,
you should read the
&lt;a href=&quot;https://people.csail.mit.edu/rivest/pubs/PSNR20.pdf&quot;&gt;MIT report&lt;/a&gt;.
The authors
do a valiant job of trying to design a blockchain-based voting system
using coins as votes, but honestly it&#39;s just a mess, with all the
problems I&#39;ve described here and more (this isn&#39;t a critique of
the authors; their point is that it&#39;s a bad idea, so it&#39;s proof
by contradiction.) The bottom line is that
blockchain technology just isn&#39;t a good fit for this application.&lt;/p&gt;
&lt;h2 id=&quot;solving-the-wrong-problem&quot;&gt;Solving the wrong problem &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#solving-the-wrong-problem&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finally, the whole argument here kind of
rests on a misdiagnosis of the
situation, namely that the problem with conventional voting systems is
that they are inherently (1) slow to get results and (2) open to questions of validity,
and hence that we need Internet voting to solve these problems.&lt;/p&gt;
&lt;h3 id=&quot;speed&quot;&gt;Speed &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#speed&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s entirely possible for conventional voting systems to
produce rapid results (though in all fairness, not as fast
as an Internet-only system). It&#39;s true that there have been a number of recent elections where
it took a number of days to determine the winner, as more votes
trickled in. In some cases, candidate A looked like a winner early
but was the eventual loser when all the votes were in, which
has caused a lot of suspicion among people who didn&#39;t understand
what was happening. However, many jurisdictions actually are
able to resolve elections quickly. For instance, Florida
mostly got &lt;a href=&quot;https://www.fox4now.com/news/local-news/investigates/how-florida-counts-votes-so-fast-compared-to-other-states&quot;&gt;same-day results&lt;/a&gt;
in 2022.&lt;/p&gt;
&lt;p&gt;To understand what causes delay, it helps to understand the logistics
of voting.
The consensus best choice in the voting security community is
&lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/&quot;&gt;optically scanned (opscan) paper ballots&lt;/a&gt;.
These can be counted in one of two ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Precinct count:&lt;/em&gt; The ballots are fed into a machine in the
precinct which counts them immediately and then can report
the results.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Central count:&lt;/em&gt; The ballots are sent back to election central
where they are scanned.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Precinct count systems can deliver results immediately upon poll
closure, with some potential risk to voter privacy (you have
to trust the machine not to record the order of ballots and their
contents). With systems like this, you can get a count on election
night (pending verification, as below).
Central count machines obviously take longer to
report values, but modern central count scanners can count
&lt;a href=&quot;https://www.dominionvoting.com/download/imagecast-central/?wpdmdl=67331&amp;amp;masterkey=5f10715444428&quot;&gt;hundreds of ballots per minute&lt;/a&gt;,
so it&#39;s not implausible that you could get an election night count
with an acceptable cost, as Florida already does.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There are a number of reasons why elections can be slow to
resolve, but one of the main ones is absentee/mail-in ballots.
For instance, in California, ballots can be postmarked on
election day, so you need to wait days for all of the ballots
that were mailed to be delivered. In some jurisdictions,
you can&#39;t even &lt;a href=&quot;https://www.ncsl.org/research/elections-and-campaigns/vopp-table-16-when-absentee-mail-ballot-processing-and-counting-can-begin.aspx&quot;&gt;start counting&lt;/a&gt; absentee ballots until election day, which
means you need to count a lot of ballots right away.
A number of jurisdictions have both of these problems: in
Mississippi ballots can be processed up to 5 days after
election day if they are postmarked on election day &lt;em&gt;and&lt;/em&gt;
you&#39;re not even allowed to start checking the signatures
on them until election day! As noted above, if you have
the right policies you can get answers reasonably quickly.&lt;/p&gt;
&lt;p&gt;It&#39;s certainly true that ballots received over the Internet
could be tallied instantly, so in that respect we would expect
Internet voting to be faster, but this only works if we
require everyone to vote over the Internet, which has the
potential to really disenfranchise a lot of people
(people who can&#39;t afford modern devices, those who aren&#39;t
comfortable with new technologies, etc.). If a significant
number of people still vote mail-in with paper ballots,
then you still have the problem. The bottom line here
is that if we want to prioritize rapid election results
at the cost of making it harder to vote remotely (and while
for many people an app would be easier, for some it would
be harder), then
we know how to do it; it&#39;s a choice to have slow election
results.&lt;/p&gt;
&lt;p&gt;It&#39;s also important to note that this is all about preliminary
results. Full verification takes time, both with paper-based
systems and for end-to-end verifiable systems. For paper-based systems,
this is because the risk-limiting audit or hand count is
manual. In end-to-end verifiable systems, the cryptographic
pieces can be checked immediately, but you need to give
time for people to challenge the initial vote input set
(and specifically to object that their vote was not included).
Until that&#39;s happened, you have no way of knowing that
the voting system didn&#39;t just exclude a lot of voters.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;disputes-about-validity&quot;&gt;Disputes about validity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#disputes-about-validity&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;From a technical perspective, election validity comes
down to the ability to demonstrate to a third party—ideally
to any third party, but in practice to some set of
third parties that are collectively trusted&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
by the electorate—that each phase of
the election was correctly conducted, or at least that
the inevitable errors were insufficiently large to
affect the final result.&lt;/p&gt;
&lt;p&gt;For ordinary elections,
verifiability is provided by
a combination of observability—at least in principle—for
the manual processes and double-checking for the
inherently unverifiable electronic processes (if any).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
This second feature is typically described using
the concept of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Software_independence&amp;amp;oldid=1068243261&quot;&gt;software independence (SI)&lt;/a&gt;,
defined by &lt;a href=&quot;http://people.csail.mit.edu/rivest/RivestWack-OnTheNotionOfSoftwareIndependenceInVotingSystems.pdf&quot;&gt;Rivest and Wack&lt;/a&gt; as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A voting system is software-independent if an undetected change or
error in its software cannot cause an undetectable change or error
in an election outcome.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The intuitive reason for SI is that we know computers to be very
insecure—and multiple reviews of electronic voting systems
have found serious vulnerabilities—and that their operations are opaque, so
any voting system shouldn&#39;t depend on trusting them.&lt;/p&gt;
&lt;p&gt;With a hand-marked paper ballot system, you have some set of
processes to ensure that only registered voters vote, but
you still need to verify that the tabulation is performed
correctly. If you count the ballots by hand, we&#39;re back
to observability, but if you count them by machine, then
you need a double check. This can be provided by using
using a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Risk-limiting_audit&amp;amp;oldid=1087546062&quot;&gt;risk-limiting
audit&lt;/a&gt;,
in which a sample of the ballots is publicly counted.
Of course, if there is real doubt or the margin is very
close then you can do a full hand count, but in either
case the entire counting process can be made verifiable
(though in practice, RLAs are nothing like universal).
They key point here is that if you follow the right practices,
then even a complete compromise of the scanner will not
lead to the wrong result.
If you use ballot marking devices instead of hand-marking
the ballots, then this does not completely provide SI:
if the BMD is compromised then the attacker
can have it record the wrong result; some voters will check
and catch the error, but others won&#39;t and for those voters
the attack will succeed. The counting process is still verifiable,
of course.&lt;/p&gt;
&lt;p&gt;Similarly, end-to-end verifiable systems provide SI for tabulation by
making it possible—at least in theory—for someone to write
their own system from scratch that will verify the election.
However, if users are voting on their own devices, then any
compromise of those devices can completely compromise the device,
and there&#39;s no plausible way to detect or recover from this
form of attack, which is even worse than with BMDs. Imagine what
happens in an election where it&#39;s discovered that even a small
number of user devices had been compromised; how would you
have confidence in the result? As noted above,
using a blockchain doesn&#39;t help with this at all.&lt;/p&gt;
&lt;p&gt;Even if we confine our attention to the parts of the system that
are independently verifiable, actually convincing yourself that
the election was correctly conducted can be a pretty challenging
proposition. A full hand count is directly verifiable if you
watch the whole thing, and while the idea behind a risk limiting audit
is simple, knowing how many ballots to count involves
some reasonably complicated math. The situation with any end-to-end
verifiable system is dramatically worse in that not only is the
math very complicated, even the logic takes &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto&quot;&gt;thousands of words&lt;/a&gt;
to explain. It&#39;s pretty hard to see how explaining that votes are
correct because they are digitally signed and then
mixed in a way you can check by verifying a zero-knowledge
proof is going to put to rest any questions of validity.&lt;/p&gt;
&lt;p&gt;You&#39;ll note that above I said that from a &lt;em&gt;technical&lt;/em&gt; perspective
validity disputes comes down to third party verifiability. The bigger
problem here is that many election disputes don&#39;t come down to
technical questions at all, because most people people aren&#39;t going to research
the details of how elections are run—how many people still
think that there was tabulation fraud in Georgia, even after a
&lt;a href=&quot;https://web.archive.org/web/20220207003435/https://sos.ga.gov/index.php/elections/historic_first_statewide_audit_of_paper_ballots_upholds_result_of_presidential_race&quot;&gt;full hand count&lt;/a&gt;?—and
end up making decisions on other grounds, using
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Motivated_reasoning&amp;amp;oldid=1111874311&quot;&gt;motivated reasoning&lt;/a&gt;
or based on who they trust more. It&#39;s hard to see how any set of
technical mechanisms will really convince everyone, though
I&#39;m especially skeptical that arguments based on fancy cryptography
will do the job.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said in my original &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto&quot;&gt;post&lt;/a&gt; on end-to-end
verifiable voting, voting isn&#39;t just a technical problem: it&#39;s
embedded in a system of social practices and it&#39;s those social
practices which make the problem complicated (again, I
encourage anyone interested in voting to actually go
serve as an election worker). It&#39;s of course
possible to improve voting technology, but most proposals for
how we could radically improve everything using new technology
&lt;strong&gt;X&lt;/strong&gt; fall down when you realize that &lt;strong&gt;X&lt;/strong&gt; don&#39;t take into
account those existing operational realities. This is largely
the case with Internet voting.
The problem with using blockchains for Internet voting is simpler, though: it doesn&#39;t solve any problem
that can&#39;t be solved with other, simpler technology. Of course,
that could also be said of a number of other proposed
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/&quot;&gt;applications&lt;/a&gt;
&lt;a href=&quot;http://localhost:8080/posts/dns-security-blockchain2/&quot;&gt;of&lt;/a&gt;
&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/&quot;&gt;blockchains&lt;/a&gt;, which, to
quote Mark Nottingham &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-nottingham-avoiding-internet-centralization-03#name-blockchains-are-not-magical&quot;&gt;are not magical&lt;/a&gt;.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The scare quotes are here because there is of course a pre-existing
use of the term &amp;quot;crypto&amp;quot; to mean &amp;quot;cryptography&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The reason this is important is that you need to prevent
&amp;quot;double-spending&amp;quot; attacks where people use the same cryptographic
token to pay two people. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The analogous
check in a blockchain-based cryptocurrency system
is that the payee verifies that a transaction is
recorded on the blockchain before they believe
they have been paid. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is actually how pre-Bitcoin timestamping systems
were designed. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The Bitcoin maximum transaction rate is
&lt;a href=&quot;https://en.bitcoin.it/wiki/Maximum_transaction_rate&quot;&gt;famously low&lt;/a&gt;,
though other networks do better. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Handwaving alert: The Interscan Hipro can scan 300 pages
per minute and costs &lt;a href=&quot;https://www.gsaadvantage.gov/ref_text/GS35F0062N/0WY7WJ.3SOKV8_GS-35F-0062N_SCHED70MOD131.PDF&quot;&gt;under $200,000&lt;/a&gt;.
Los Angeles is probably the biggest county in the US with almost 6 million registered voters:
if you had about 40 scanners you could do all these counts in less than 10 hours
at a capital cost of less than $10 million (of course, there
are lots of other costs to consider). &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Remember that many registered voters don&#39;t actually
vote, so you need some way of distinguishing the case
where people didn&#39;t vote from the case where their
votes were discarded. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
By which I mean that for the vast majority of voters,
there is at least one verifier they trust, even if
not all voters trust the same verifier. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Outside the US, hand counting is common, but in the US,
it&#39;s pretty much necessary to use machine counting
for &lt;a href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#scalability&quot;&gt;logistical reasons&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-blockchain/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>How to securely vote for (or against) Elon Musk</title>
		<link href="https://educatedguesswork.org/posts/voting-crypto/"/>
		<updated>2022-12-24T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/voting-crypto/</id>
		<content type="html">&lt;style&gt;
.img-wrap {
  display: inline-block;
}
.img-wrap img {
  width: 80%;
}&lt;/style&gt;
&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: this post contains a bunch of LaTeX math notation rendered
in MathJax, but it doesn&#39;t show up right in the newsletter
version. You may want to instead read the version on the &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto&quot;&gt;site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Earlier this week Elon Musk ran a &lt;a href=&quot;https://twitter.com/elonmusk/status/1604617643973124097&quot;&gt;poll&lt;/a&gt;
for whether he should step down as head of Twitter.
As of this writing, the poll stood overwhelming (57.5 to 42.5) against Musk.&lt;/p&gt;
&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;Should I step down as head of Twitter? I will abide by the results of this poll.&lt;/p&gt;&amp;mdash; Elon Musk (@elonmusk) &lt;a href=&quot;https://twitter.com/elonmusk/status/1604617643973124097?ref_src=twsrc%5Etfw&quot;&gt;December 18, 2022&lt;/a&gt;&lt;/blockquote&gt; &lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt; 
&lt;p&gt;Unsurprisingly, there have been &lt;a href=&quot;https://www.salon.com/2022/12/19/right-wingers-cry-fraud-as-twitter-users-overwhelmingly-vote-for-elon-musk-to-resign-in-his-own-poll/&quot;&gt;claims&lt;/a&gt; of voter
fraud (via &amp;quot;bots&amp;quot;) as well as concerns that Musk would &lt;a href=&quot;https://t.co/vj80PIfbzk&quot;&gt;retaliate&lt;/a&gt;
against people who voted that he should step down.
Twitter polls are just some code on a Web site, and
so are trivially insecure in any number of ways,
including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;There&#39;s no way to validate who voted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There&#39;s no way to externally verify that the votes were accurately
counted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s trivial for anyone in control of Twitter servers to
see who voted which way.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These weaknesses follow directly from the way that Web sites are
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/&quot;&gt;built&lt;/a&gt;:
your browser is just running a program that is provided by the server,
and you vote by sending your vote to the server, so &lt;em&gt;of course&lt;/em&gt; the
server can see your vote and lie about it. The typical way to
address this is with &lt;em&gt;physical&lt;/em&gt; countermeasures like having
people vote with &lt;a href=&quot;https://educatedguesswork.org/posts/voting-hcpb/&quot;&gt;paper ballots&lt;/a&gt;
or having some kind of &lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#voter-verifiable-paper-audit-trails-(vvpat)&quot;&gt;paper trail&lt;/a&gt;
of how people voted.
However, over the past 25 years or so it&#39;s become possible to build a voting system
that is entirely remote (i.e., that doesn&#39;t involve you voting in
person or sending any physical object anywhere) and yet provides
strong privacy and security guarantees [terms and conditions apply].&lt;/p&gt;
&lt;p&gt;These technologies are typically called
&lt;em&gt;cryptographic&lt;/em&gt; or more recently &lt;em&gt;end-to-end (E2E)&lt;/em&gt; voting systems.
There has been an enormous amount of work in this area; in this
post, I&#39;ll be describing a simplified version of
a pair of papers by two of the pioneers of
the field, &lt;a href=&quot;https://www.usenix.net/legacy/events/evt06/tech/full_papers/benaloh/benaloh.pdf&quot;&gt;Josh Benaloh&lt;/a&gt; and &lt;a href=&quot;http://www.usenix.org/events/sec08/tech/full_papers/adida/adida.pdf&quot;&gt;Ben Adida&lt;/a&gt;
(&lt;a href=&quot;https://vote.heliosvoting.org/&quot;&gt;Helios&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&quot;background&quot;&gt;Background &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#background&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Before we get into end-to-end systems, it&#39;s useful to review how a
typical paper ballot system works. You can find a more detailed
description in &lt;a href=&quot;https://educatedguesswork.org/posts/voting-hcpb/&quot;&gt;previous&lt;/a&gt;
&lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/&quot;&gt;posts&lt;/a&gt; and a description of the requirements
&lt;a href=&quot;https://educatedguesswork.org/posts/voting1/&quot;&gt;here&lt;/a&gt;.
Typically you have preprinted &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Secret_ballot&amp;amp;oldid=1125749760&quot;&gt;paper ballots&lt;/a&gt;
which lists the choices for each contest. For instance:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sample-ballot.png&quot; alt=&quot;Part of California sample ballot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The voter marks the ballot with their selection and then submits
it for tabulation. In many (though not all systems) the ballot
is then mixed with other ballots and &lt;em&gt;shuffled&lt;/em&gt; (or maybe at least
shaken around a bit) so that the
order in which people voted is not preserved. For instance,
they might be put in a cardboard box, shuffled, and then
carried to some central place for tabulation.
The ballots
are then tabulated and the totals reported.&lt;/p&gt;
&lt;p&gt;This system has a fairly straightforward verifiability story
if you can observe the process: you can observe who voted
and that each voter only got a single ballot. As long as
the chain of custody for ballots is secure (i.e., the ones
that go into the box are the ones that are counted) and
then can observe the counting process, then you can verify
that the totals are right, and can have confidence in the
whole election. The privacy story is similarly straightforward: the ballots
are shuffled and as long as people don&#39;t mark them in
a distinguishing fashion—a big assumption!—then
it&#39;s not possible to associate a ballot with a given voter
(what&#39;s called &amp;quot;k-anonymity&amp;quot;).&lt;/p&gt;
&lt;h2 id=&quot;how-not-to-build-a-cryptographic-voting-system&quot;&gt;How Not to Build a Cryptographic Voting System &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#how-not-to-build-a-cryptographic-voting-system&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Building an end-to-end voting system is more or less a matter of
replicating these properties without the paper. This is harder
than it sounds. The basic problem is that there is a tension
between two properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Ensuring the &lt;strong&gt;integrity&lt;/strong&gt; of the ballot through the entire
process.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Protecting the &lt;strong&gt;anonymity&lt;/strong&gt; of individual votes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you&#39;re willing to give up either of these properties, then
the problem becomes fairly straightforward.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;secure-but-non-anonymous-ballots&quot;&gt;Secure but Non-Anonymous Ballots &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#secure-but-non-anonymous-ballots&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you&#39;re willing to sacrifice anonymity, then you can just
have signed ballots. The way that this works is that each
user has a cryptographic key pair (this of course imports
all the usual problems with &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/&quot;&gt;cryptographic identities&lt;/a&gt;,
but let&#39;s assume that those are solved). In order
to vote you sign your ballot and submit it to the
election administrators.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;knowing-who-voted&quot;&gt;Knowing Who Voted &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#knowing-who-voted&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It may come as a surprise to people that we would publish
who voted, but this is actually a fairly common feature
of real-world systems. For instance, when I worked the
polls in Santa Clara County, you would cross people off the
voter sheet when they voted and then periodically post a
copy of the sheet; this is a useful transparency measure
but also allows campaign workers to know where they
should focus their get out the vote measures.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The administrators post all the signed ballots on some public bulletin
board. This allows anyone to verify who voted and file challenges&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; in
case of irregularities, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Their vote wasn&#39;t included&lt;/li&gt;
&lt;li&gt;Someone voted who shouldn&#39;t have&lt;/li&gt;
&lt;li&gt;Someone voted multiple times&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once the challenges are complete, everyone can then verify the tabulation
for themselves. Of course, this also lets anyone know exactly how
everyone else voted, which is bad.&lt;/p&gt;
&lt;h3 id=&quot;anonymous-but-insecure-ballots&quot;&gt;Anonymous but Insecure Ballots &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#anonymous-but-insecure-ballots&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;On the other hand, if you don&#39;t care about the integrity of the
election, you can use standard techniques to mix the ballots.
For instance, you can have a series of proxies arranged in what&#39;s
called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mix_network&amp;amp;oldid=1098643215&quot;&gt;mix network (mixnet)&lt;/a&gt;,
as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/shuffle-votes.png&quot; alt=&quot;Mix network&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The idea here is that you have a series of independently operated proxies.
Each voter recursively encrypts their ballot, first to the tabulator,
then to proxy 1, and then to proxy 2. They then send their ballots to
proxy 1, which decrypts them, shuffles them, and forwards them to proxy 2.
Proxy 2 does the same and forwards them to the tabulator, which finally
decrypts them. The end result is that the tabulator gets a list of ballots but is unable to
determine which ballots correspond to which voter or what order they
were cast in: it receives them in random order and because encryption
was removed at each layer, there is no way to match the contents of
a given ballot to the encrypted version that was cast.
This property holds as long as at least one of the proxies
is honest, and you can of course have an arbitrary number of proxies.&lt;/p&gt;
&lt;p&gt;Unfortunately, any one of the proxies or the tabulator can tamper
with the election results. It&#39;s obvious that the tabulator can do this
because they have the final ballots, but the proxies can do the same
by replacing the genuine ballots with fake ones.
You could of course have the voters sign the ballots, but then
this obviates the point of shuffling them because the voter&#39;s
identity will be available to the tabulator. Even if you
sign them before they are sent to the first proxy, that proxy
has to strip the signatures.&lt;/p&gt;
&lt;h2 id=&quot;building-a-real-design&quot;&gt;Building a Real Design &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#building-a-real-design&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The underlying problem with the mixnet scheme is that it doesn&#39;t
do anything to ensure that the ballots that come into the mixnet
are the ballots that come out of it. In an ordinary paper-based
system, this is provided by physical properties: you verify
that the box is empty at the start of the election and you
can have confidence that paper ballots won&#39;t change in transit
or create new ballots via spontaneous generation. However,
the proxies are much more complicated than cardboard boxes and
they can readily create, modify, or delete ballots.&lt;/p&gt;
&lt;p&gt;What we need is some way to verify the integrity of the mixing system, or more
precisely, a way for a mixer to &lt;em&gt;prove&lt;/em&gt; that it has executed
the mixing correctly, which is to say that there is a one-to-one
relationship between the ballots that were put into the system
and those that came out. I describe how to build such a mixer
below.&lt;/p&gt;
&lt;h3 id=&quot;re-encryption&quot;&gt;Re-Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#re-encryption&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In order to do this we first need a new primitive, which is a way to
&lt;em&gt;reencrypt&lt;/em&gt; a value encrypted to &lt;em&gt;Alice&lt;/em&gt; so that the ciphertext (the
encrypted value) is different but the plaintext (what you get when
you decrypt) is the same. Importantly, you need to be able to do
this without knowing the encryption key or the plaintext (it&#39;s
trivial to do otherwise by just decrypting and reencrypting).
In the simple proxy design above we just solved this problem by
using nested encryption, but for reasons that will shortly
become apparent, that doesn&#39;t work here, so we need a new primitive.&lt;/p&gt;
&lt;p&gt;You can find details of how to implement
reencryption &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#ipa-technical-details&quot;&gt;here&lt;/a&gt;
but we can just assume that we have some function $R$ that performs this operation. Specifically,
given a ciphertext $C$ and a randomizer $r$, we can compute:&lt;/p&gt;
&lt;p&gt;$$
R(r, C) &#92;rightarrow C&#39;
$$&lt;/p&gt;
&lt;p&gt;Such that:&lt;/p&gt;
&lt;p&gt;$$
Decrypt(C) = Decrypt(C&#39;)
$$&lt;/p&gt;
&lt;p&gt;We can create a mixer by using re-encryption instead of removing
one layer of encryption, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/reencrypt-mix.png&quot; alt=&quot;A mixer using re-encryption&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Without knowing the $r_i$ randomization values, it&#39;s not possible
to associate the output ciphertexts with their corresponding
inputs.&lt;/p&gt;
&lt;p&gt;Unlike nested encryption, re-encryption has the nice property that you can re-encrypt
the same ciphertext multiple times without any help from the sender.
So, for instance, you can just add another mixer stage without
having to add another layer of nesting. More importantly,
you can can create an arbitrary number of equivalent ciphertexts
from the same initial ciphertext. We use this fact below.&lt;/p&gt;
&lt;h3 id=&quot;provable-mixing&quot;&gt;Provable Mixing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#provable-mixing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Re-encryption-based mixing is a more flexible design than nested encryption,
but this still leaves us trusting the mixer. However, once we base
our mix on re-encryption we can prove that the mix was
performed correctly.&lt;/p&gt;
&lt;p&gt;It&#39;s of course trivial to prove that the mix
was performed correctly if you&#39;re willing to reveal the mapping
itself: you just publish which inputs correspond to to which outputs
as well as the reencryption factors $r_i$ and anyone can verify
for themselves that the ostensible inputs result in the right outputs.
What we want to do however is prove that the mix was performed correctly
&lt;em&gt;without&lt;/em&gt; revealing the mapping between inputs and outputs, which is
obviously harder. Fortunately, there is a clever trick we can use
here, due to &lt;a href=&quot;https://git.gnunet.org/bibliography.git/plain/docs/SK.pdf&quot;&gt;Sako and Kilian&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The basic idea is that instead of mixing the ballots once, the
mixer instead does so &lt;em&gt;twice&lt;/em&gt;, creating two alternative mixes.
It publishes both of them, identifying (arbitrarily) one as the
output and the other as what&#39;s called the &amp;quot;shadow&amp;quot; mix. The diagram
below shows the situation, with dashed arrows to indicate that observers
are unable to see the mapping:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sk-mix.png&quot; alt=&quot;A regular mix with a shadow&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Now consider the case where the mixer cheated and replaced
value $V_1$ in the output with a new value $V_a$.
Assuming everything else was correct, the shadow mix can
be in one of two states. First, the shadow mix can be correct, which is
to say that it contains $V_1$ rather than $V_a$. In this
case, there is a 1-1 mapping between the input values
and the shadow mix, but no 1-1 mapping between the shadow
mix and the outputs.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/shadow-mix-correct-shadow.png&quot; alt=&quot;Shadow mix is correct&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Alternatively, the shadow mix can be incorrect and contain
$V_a$ rather than $V_1$. In this case, there is a 1-1 mapping
between the shadow mix and the outputs but not between
the shadow mix and the inputs.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/shadow-mix-incorrect-shadow.png&quot; alt=&quot;Shadow mix is incorrect&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Either way, if the mixer cheated, there will &lt;em&gt;either&lt;/em&gt; not be
a mapping from the input to the shadow &lt;em&gt;or&lt;/em&gt; from the shadow
to the output (of course, it&#39;s possible for both to be wrong).
If the verifier then randomly &lt;em&gt;challenges&lt;/em&gt; the mixer to
reveal either of the mappings, there is a $1/2$ chance that
the mixer will be unable to do so (obviously, if
the mixer discloses both, then this is the same
as disclosing the full mapping to the output, but
because the shadow mix is shuffled
with respect to the input-output mappings, neither
of the mappings to the shadow mix tells you anything
about the input-output mappings).
On the other hand, if the mixer has behaved honestly, it can reveal
either mapping when challenged.&lt;/p&gt;
&lt;p&gt;Given this design, if the mixer cheats they have a $1/2$
chance of getting caught, or, to look at it another way,
a $1/2$ chance of getting away with it. It&#39;s straightforward,
however, to make that chance arbitrarily small, just
by having the mixer create more than one shadow mix. The
verifier then asks them to reveal one half of the mapping
for each of the shadows—but never both halves
for a given shadow. Each of these challenges has a $1/2$
chance of detection, so if you have $n$ challenges, the
chance of successful cheating is $2^{-n}$, which quickly
gets very small; somewhere between 80 and 100 shadows
is easily sufficient.&lt;/p&gt;
&lt;p&gt;Note that this does not prove that the ballots were actually
randomly&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
shuffled, merely that they map 1:1 between input
and output. The standard way to ensure shuffling is to
have multiple mixers: as long as any one is honest and
actually shuffles, the result will be random.&lt;/p&gt;
&lt;h3 id=&quot;a-non-interactive-proof&quot;&gt;A non-interactive proof &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#a-non-interactive-proof&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The obvious problem here is that this proof of correctness
is &lt;em&gt;interactive&lt;/em&gt; which is to say that it requires someone
to actually generate the challenges. If you&#39;re not that person
you just have to trust them. This is still better than nothing
because the verifier could be separate from the mixer, but it&#39;s
possible to do better still, creating a &lt;em&gt;non-interactive&lt;/em&gt; proof
that the mix was done correctly.&lt;/p&gt;
&lt;p&gt;The intuition to have here is that the interactive proof works by
forcing the mixer to commit to the outputs before they get to
learn the challenge. This prevents the mixer from
creating a specific dishonest mapping that will pass
a specific known challenge, which is quite easy (just make the
challenged side correct). However, we can achieve the same effect by making it impossible
for the dishonest mixer to control the challenge for a given
set of mappings. We do this by computing the challenge
using a hash of the outputs. I.e., the mixer:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Computes the output and $n$ shadow mixes $S_1, S_2, ... S_n$.&lt;/li&gt;
&lt;li&gt;Hashes those to produce a string of at least $n$ bits ($H_i$)&lt;/li&gt;
&lt;li&gt;Publish the output and shadow mixes and also for each bit of the hash $H_i$, publish either the mapping from the input to the shadow (if the bit is 0) or from the shadow to the output (if the bit is 0).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The verifier then recomputes the hash over the outputs and
checks that the mixer has provided valid mappings for the
indicated side.&lt;/p&gt;
&lt;p&gt;It&#39;s natural to wonder whether the mixer could compute a mapping
such that the hash has the right set of challenges. In principle
yes, but because the hash is an unpredictable function of the
mappings, they have to first compute the mapping and then check
whether it works. You have a random $2^{-n}$ chance of getting a matching
mapping, and so you just have to keep trying; it costs about $2^{n/2}$ operations
to find an appropriate input, which is prohibitive if $n$ is large
enough.&lt;/p&gt;
&lt;p&gt;The problem of turning an interactive proof into a non-interactive
one occurs all over cryptography, and this hashing technique,
called the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Fiat%E2%80%93Shamir_heuristic&amp;amp;oldid=1099414139&quot;&gt;Fiat-Shamir Heuristic&lt;/a&gt;, is the standard solution.&lt;/p&gt;
&lt;h3 id=&quot;tabulation&quot;&gt;Tabulation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#tabulation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Once we have a shuffled set of encrypted ballots, we need to count them.
At a high level, this is simple: the election officials decrypt them
to reveal the original ballots. They publish those ballots and anyone
can then tabulate them themselves. However, there are two subtleties
we need to consider here.&lt;/p&gt;
&lt;h4 id=&quot;verifiable-decryption&quot;&gt;Verifiable Decryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#verifiable-decryption&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The first problem we have is that the election officials could simply
lie about the contents of the ballots. I.e., they could say that a
vote for Alice was actually a vote for Bob. This actually turns out
to have an easy answer: it&#39;s possible for them to create a proof
that they correctly decrypted the ballot. The details are a bit
complicated but don&#39;t really matter: the bottom line is that
it&#39;s possible to publish the following triplet:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The encrypted ballot $E(V_i)$&lt;/li&gt;
&lt;li&gt;The decrypted ballot $V_i$&lt;/li&gt;
&lt;li&gt;A proof that $E(V_i)$ decrypts to $V_i$.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once verifiers have checked that the proofs are correct, they can
then tabulate the decrypted ballots.&lt;/p&gt;
&lt;h4 id=&quot;multiple-decryption-keys&quot;&gt;Multiple Decryption Keys &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#multiple-decryption-keys&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The second problem is that election officials might decrypt the
encrypted ballots when they are initially posted on the bulletin
board, this learning how everyone voted. As noted above, these ballots are signed and so are
easy to attribute.&lt;/p&gt;
&lt;p&gt;The standard approach to mitigating this threat is to encrypt
each vote with multiple keys, so that you need multiple election
officials—or even some trusted third party—to do
the decryption. This means that they all need to collude in order
to violate user privacy by decrypting ballots before they
are shuffled. This is cryptographically straightforward
(see &lt;a href=&quot;http://localhost:8080/posts/ipa-overview/#ipa-technical-details&quot;&gt;here&lt;/a&gt; for
one way to do it with ElGamal Encryption).
Note that even if election officials &lt;em&gt;do&lt;/em&gt; all collude, this
still doesn&#39;t threaten the integrity of the election.&lt;/p&gt;
&lt;h2 id=&quot;putting-it-all-together&quot;&gt;Putting it all Together &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#putting-it-all-together&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We now have the makings of a complete system, shown in the
figure below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/End-to-end-voting.png&quot; alt=&quot;A complete E2E system&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The election process looks like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;To cast their ballot, each voter encrypts it with
the public key(s) of the election officials and then
signs it with their own private key. They post it to
the bulletin board.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Once the ballots are all cast, the mixer strips
the digital signatures (thus anonymizing them),
shuffles the ballots, and posts them along with the
proof of correct shuffling to the bulletin board.
This can be the same bulletin board or a separate one.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The election officials take the shuffled
ballots, decrypt them, and posts the decrypted
ballots along with their proofs of correct decryption.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In order to verify the election, you take the following
steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Check the signatures on the ballots. This ensures that
the right set of voters cast their votes and that they
attest to the contents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Check the proof of shuffling. This ensures that the
the shuffled ballots correspond to the ballots you
verified the signatures on (though you can&#39;t tell which
ones are which).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Check the proof of decryption. This ensures that the
plaintext ballots correctly match the encrypted ballots.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Any observer can take these steps without any help from the
voting officials.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;different-tabulation-methods&quot;&gt;Different Tabulation Methods &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#different-tabulation-methods&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In principle you can use any tabulation method you want,
but things get a little complicated if you want to do
anything fancy, because you want to prevent voters
from being able to prove how they voted (a property
called &amp;quot;receipt freeness&amp;quot;). The reason is that they
might then be able to sell their vote (or be coerced
into voting a certain way). If the ballot is complicated,
the voter can encode their identity by voting a certain
way in &amp;quot;down-ticket&amp;quot; (less important) parts of the ballot
(an attack called &amp;quot;pattern voting&amp;quot;). It&#39;s easy to address
this for less important contests (e.g., you are paid to
vote in the Presidential election but encode your identity
in your votes on local judges), by just having each
contest voted separately, but for voting systems
where you have multiple votes in each contest that
need to be considered together such
as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Single_transferable_vote&amp;amp;oldid=1128476013&quot;&gt;single transferable vote (STV)&lt;/a&gt; the situation becomes more complicated&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.
There are &lt;a href=&quot;https://www.usenix.org/legacy/events/evt08/tech/full_papers/teague/teague_html/&quot;&gt;E2E designs&lt;/a&gt;, which work for STV, but they are more complicated.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If all the steps complete successfully, you have then
verified that the output decrypted ballots have a 1:1
relationship with the signed ballots that you verified
and hence with the ballots you expected to be cast, and
therefore that the election was conducted correctly.
You can then tabulate the ballots in the usual way and
verify that the totals match what you expected, thus
verifying the entire election.&lt;/p&gt;
&lt;h2 id=&quot;against-internet-voting&quot;&gt;Against Internet voting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#against-internet-voting&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;E2E voting is an amazing technical achievement, but despite that,
the &lt;a href=&quot;https://www.aaas.org/epi-center/internet-online-voting&quot;&gt;broad consensus&lt;/a&gt;
of people working in voting is that Internet
voting is a bad idea, even using E2E systems. Why the incongruity?
The reason is that voting isn&#39;t just a technology, but rather
is embedded in a whole election system, and it&#39;s in that context
that E2E voting falls short.&lt;/p&gt;
&lt;h3 id=&quot;voting-device-security&quot;&gt;Voting Device Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#voting-device-security&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first problem is that unlike (say) hand-marked paper ballots,
cryptographic voting systems require users to vote on some sort
of computer (you weren&#39;t planning to do elliptic curve math in
your head, right?). There are two main ways for this to work:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;You can use the same types of electronic polling place devices that
people use for voting now.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can vote on your own device (e.g., your phone).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;But this means that you&#39;re trusting that computer to actually
correctly cast your votes and
computers are incredibly hard to secure,
as we&#39;ve seen this repeatedly in the elections context, where third
party audits have repeatedly shown that even temporary access to
polling place voting devices is sufficient to subvert them (see, for
instance the reports from the California &lt;a href=&quot;https://web.archive.org/web/20150617013058/https://www.sos.ca.gov/elections/voting-systems/oversight/top-bottom-review&quot;&gt;Top-to-Bottom
Review&lt;/a&gt;
back in 2007). The situation isn&#39;t much better with personal devices,
which need regular updating to address a constant stream of discovered
vulnerabilities (just as an example, here&#39;s the list of &lt;a href=&quot;https://support.apple.com/en-gb/HT213530&quot;&gt;security
issues&lt;/a&gt; in the latest iOS
release; any major system has a similar list).&lt;/p&gt;
&lt;p&gt;There has been some good work on allowing users to verify that their
votes were correctly (see for instance Section 4 of
&lt;a href=&quot;https://www.usenix.net/legacy/events/evt06/tech/full_papers/benaloh/benaloh.pdf&quot;&gt;Benaloh06&lt;/a&gt;).
This is a more complicated problem than it seems because it&#39;s
important to avoid providing the voter with a receipt that could be
used to prove &lt;em&gt;how&lt;/em&gt; they voted (as opposed to &lt;em&gt;that&lt;/em&gt; they voted).
Otherwise, this receipt can potentially be used to enable vote buying.
Of course, these approaches ultimately require the voter to use
some other computer to verify whatever proof the voting device
spits out.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;the-uses-of-statistical-evidence&quot;&gt;The Uses of Statistical Evidence &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#the-uses-of-statistical-evidence&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;If you have a voting system that introduces
biased errors, it might in principle be detectable.
Suppose that the system changes 1% of votes from
Smith to Jones but leaves the Jones voters alone.
If some fraction of voters check their votes, then
we&#39;ll see more corrections of Jones votes than
Smith votes, though we&#39;ll still see some Smith
corrections, because some voters will accidentally
vote Smith when they meant to vote Smith. You could
imagine running some kind of hypothesis test to
determine whether you saw an unexpectedly high
rate of Smith → Jones errors, but even if that
came up significant, it&#39;s not clear what you&#39;d
do with this information, because voter errors
aren&#39;t unbiased either. To take a famous example,
there&#39;s &lt;a href=&quot;https://www.gsb.stanford.edu/faculty-research/publications/butterfly-did-it-aberrant-vote-buchanan-palm-beach-county-florida&quot;&gt;fairly strong evidence&lt;/a&gt;
that the design of the 2000 Palm Beach Florida presidential
ballot lead to systematic erroneous votes for Buchanan
when the voters meant to vote for Gore. So, even if
you had evidence that there was an unexpected
rate of errors, it&#39;s not clear what you would do
about it (in the case of Florida, nothing).&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Even with these systems, you are left with roughly the same situation
as with a &lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#ballot-marking-devices&quot;&gt;Ballot Marking Devices (BMDs)&lt;/a&gt;:
users can in principle verify their votes but often do not.
Specifically, suppose a machine is programmed
to change 1/100 votes from Smith to a vote for Jones by acting
as if the user had pressed the wrong button (ever make a typo
on your phone?). If a user does
verify their vote, the attack succeeds, and if
the user does check, the machine allows them to correct it. Because
users can and do accidentally vote for the wrong person, this
type of attack is very hard to distinguish from voter error.
&lt;a href=&quot;https://jhalderm.com/pub/papers/bmd-verifiability-sp20.pdf&quot;&gt;Studies of&lt;/a&gt;
&lt;em&gt;Ballot Marking Devices&lt;/em&gt; (BMDs) by Bernhard et al.
found that if left to themselves around 6.5% of voters
(in a simulated but realistic setting) will
detect ballots being changed. There is some good news here, which
is that with appropriate warnings by the &amp;quot;poll workers&amp;quot; the
researchers were able to raise the detection rate to 85.7%, though
it&#39;s not clear how feasible it is to get poll workers to give those
warnings. Given that checking a paper ballot is much easier than
checking a cryptographic ballot, we should expect a fairly
low rate of checking.&lt;/p&gt;
&lt;p&gt;I do want to note that we are starting to see some interest
in adding E2E to paper-based
election systems, as in &lt;a href=&quot;https://www.usenix.org/conference/evtwote13/workshop-program/presentation/bell&quot;&gt;STAR-Vote&lt;/a&gt;
or with Microsoft&#39;s &lt;a href=&quot;https://blogs.microsoft.com/on-the-issues/2019/05/06/protecting-democratic-elections-through-secure-verifiable-voting/&quot;&gt;ElectionGuard&lt;/a&gt;. This seems like a potentially good idea
in that it &lt;em&gt;augments&lt;/em&gt; the security of the paper-based system. However,
in systems with no paper trail, such as voting from user&#39;s
phones, then we&#39;re left just depending on the security of the
device itself. The threat to be most concerned about here is
an attack that compromises a large number of voter&#39;s devices,
and through them the integrity of the election. Even if you
subsequently managed to gather evidence that this had happened on a large
scale, figuring out what to do after would be a political nightmare.&lt;/p&gt;
&lt;h3 id=&quot;operational-challenges&quot;&gt;Operational Challenges &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#operational-challenges&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Even if we assume that the voting devices are uncompromised, the
actual logistics of building an operational E2E system are extremely
challenging. Elections themselves are complicated to run—if you
want to get a real sense of this, I recommend serving as a poll
worker—and there are a lot of things that can go wrong;
adding a bunch of complicated critical-path technology creates
a lot of new opportunities for failure.&lt;/p&gt;
&lt;h4 id=&quot;server-infrastructure&quot;&gt;Server Infrastructure &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#server-infrastructure&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;For example, if you want to have an Internet voting system you
need some servers which accept the votes. What happens if those
servers go down on election night or—worse yet—are
attacked? These protocols are designed to be resistant to misbehavior
by the voting servers in the sense that they can&#39;t tamper
with the results, but this doesn&#39;t address attacks designed
to prevent users from voting. Typical paper-based elections have mechanisms for addressing
this kind of failure: if the voting machines fail, you can
fall back to paper; if the electronic poll books fail, you may
have paper records; if those paper records are unavailable, people
can file provisional ballots and you can sort it out later.
However, these mechanisms all depend on people already being in
the polling place; if they&#39;re at home and things fail, the election
can completely fail.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible to &lt;em&gt;selectively&lt;/em&gt;
mount attacks, for instance by having a compromised server reject
only certain people&#39;s votes or by mounting a denial-of-service
attack on certain precincts; this is a particularly powerful
form of attack in the United States, where voting is managed
locally, and so you could attack the infrastructure of a county
that leans to one political party but ignore the infrastructure
of a county that leans the other way.&lt;/p&gt;
&lt;h4 id=&quot;client-infrastructure&quot;&gt;Client Infrastructure &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#client-infrastructure&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As discussed above, in order to successfully vote on the Internet, the
voting software needs to run on a device controlled by the voter. Even
if we ignore attack, this is a prime opportunity for things to go
wrong: here we have a piece of software which needs to be developed at
low cost, run on more or less every device that anyone might have,
needs to operate essentially perfectly, and only gets used once of
twice a year. This is a tall order for any software shop.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;real-world-voter-authentication&quot;&gt;Real-World Voter Authentication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#real-world-voter-authentication&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the real world, voter authentication is actually quite
lax. In many jurisdictions you can vote just by giving your
name. While some jurisdictions require showing photographic
ID, detecting fake IDs is not really that straightforward,
especially for people who (again) have to do it on one
day a year. In vote-by-mail systems, authentication is
performed by mailing you a ballot (thus trusting the
USPS) and then (hopefully) checking your signature on the
ballot. Despite all this, the rate of voter fraud is &lt;a href=&quot;https://www.brennancenter.org/sites/default/files/analysis/Briefing_Memo_Debunking_Voter_Fraud_Myth.pdf&quot;&gt;very low&lt;/a&gt;.
Probably a lot of the reason here is that it&#39;s hard to
conduct this kind of in-person fraud at scale. But of
course, this is not the case for Internet-based attacks.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To make matters worse, we have the problem of voter authentication. For
obvious reasons, we need each voter to prove that they are authorized
to vote, which means giving them some kind of credential. There are
a lot of options here (give them a digital certificate, mail them
a code, etc.) but whatever you do, they are all subject to voters
losing their credentials.
In ordinary Website authentication, we usually allow users to reset their
passwords via e-mail or SMS, but for obvious reasons that&#39;s not OK
here (allowing Gmail and T-Mobile to have the ability to
impersonate a huge fraction of voters really undercuts the value of
E2E voting). Here too, we&#39;re stuck with a situation where the
failure happens at the worst possible time, and recovery entails
actually going somewhere.&lt;/p&gt;
&lt;h4 id=&quot;implementation-complexity&quot;&gt;Implementation Complexity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#implementation-complexity&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Next, we have to contend with the problem of implementation
complexity. Even the best E2E voting systems are fairly complex,
and the systems they need to be embedded in are even more complex.
This means that even if we have a system design which is secure,
we still have to worry about implementation errors, both of the
protocol itself and of the rest of the infrastructure. So far,
there hasn&#39;t been that much Internet voting, but
serious errors have been found in several early systems.
See for instance, the analysis of
the &lt;a href=&quot;https://www.scytl.com/&quot;&gt;Scytl&lt;/a&gt;
system by &lt;a href=&quot;https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=9152765&quot;&gt;Haines, Lewis, Pereira, and Teague&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
and of the &lt;a href=&quot;https://voatz.com/&quot;&gt;Voatz&lt;/a&gt; mobile voting system
by &lt;a href=&quot;https://raw.githubusercontent.com/trailofbits/publications/master/reviews/voatz-securityreview.pdf&quot;&gt;Trail of Bits&lt;/a&gt;, so the situation is not encouraging.&lt;/p&gt;
&lt;h3 id=&quot;voter-comprehension&quot;&gt;Voter Comprehension &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#voter-comprehension&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Finally, we have the problem of voter understanding. It&#39;s not enough for the election just to produce the right result, it
must also do so in a verifiable fashion.  As voting researcher &lt;a href=&quot;https://www.cs.rice.edu/~dwallach/&quot;&gt;Dan
Wallach&lt;/a&gt; is fond of saying, the
purpose of elections is to convince the loser that they actually lost.
With paper-based ballots, the chain of reasoning for how the election
was decided is relatively straightforward: ballots go into the box
and you count them.
Despite this, we&#39;ve still seen extensive
attempts to question the resulting count, as in the 2020 US Presidential
Election.&lt;/p&gt;
&lt;p&gt;By contrast, the security of E2E voting depends on
some fairly complicated cryptography that practically
nobody understands. I&#39;ve just spent 4000-odd words on this topic and
it&#39;s only that short because I didn&#39;t explain how any of the actual
cryptographic pieces work and just focused on the system logic;
if you want to convince yourself that ballots were cast correctly
you have to not just have a surface understanding of the cryptography
but also have confidence that the mathematical problems it&#39;s based
on are really hard. We don&#39;t even know for sure that that&#39;s true
in the classical setting and we know that they&#39;re &lt;em&gt;not&lt;/em&gt; if
someone ever builds a big enough &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security&quot;&gt;quantum computer&lt;/a&gt;.
Try explaining that to your average voter.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-crypto/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;My point here is not to criticize E2E voting, which is an amazingly
cool technology. The problem is that it&#39;s necessary but not sufficient
for &lt;em&gt;Internet&lt;/em&gt; voting, which requires correct operation of systems which
are not covered by the cryptography, all under very challenging conditions.
However, E2E does have two important use cases: first, it&#39;s potentially
useful as an additional measure of security for in-person paper-based
systems, such as Ballot Marking Devices. Second, there are lots of
low to medium-stakes situations where people are &lt;em&gt;already&lt;/em&gt; voting over
the Internet using systems which are tragically insecure.
These elections
would be much safer and more private if they used E2E systems, even
if those systems were still imperfect. So when do we get our
E2E secure Twitter polls?&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I owe this observation to Hovav Shacham. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that you can have the contents of the ballots encrypted
to prevent selective challenges against people who voted
a certain way. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Benaloh observes that you actually don&#39;t need to
&lt;em&gt;randomly&lt;/em&gt; shuffle, them: you can just sort the
output values, which will destroy any order. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
By contrast, &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Approval_voting&amp;amp;oldid=1125914349&quot;&gt;Approval Voting&lt;/a&gt; can be implemented by having each candidate be
treated as a separate ballot. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This analysis also includes an interesting example of
an attack resulting from misuse of the Fiat-Shamir heuristic. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-crypto/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>One does not simply destroy a nuclear weapon</title>
		<link href="https://educatedguesswork.org/posts/nuclear-weapon-disposal/"/>
		<updated>2022-12-05T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/nuclear-weapon-disposal/</id>
		<content type="html">&lt;p&gt;In a recent
&lt;a href=&quot;https://www.nytimes.com/2022/11/17/science/retired-nuclear-bombs-b83.html?searchResultPosition=1&quot;&gt;article&lt;/a&gt;
the NYT reports that in the US when nuclear weapons are retired they aren&#39;t destroyed but
just stored:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Typically, nuclear arms retired from the U.S. arsenal are not melted
down, pulverized, crushed, buried or otherwise destroyed. Instead,
they are painstakingly disassembled, and their parts, including
their deadly plutonium cores, are kept in a maze of bunkers and
warehouses across the United States. Any individual facility within
this gargantuan complex can act as a kind of used-parts superstore
from which new weapons can — and do — emerge.&lt;/p&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;p&gt;“It’s important to keep these parts around,” said Franklin
C. Miller, a nuclear expert who held federal posts for three decades
before leaving government service in 2005. “If we had the
manufacturing complex we once did, we wouldn’t have to rely on the
old parts.” He added that other nuclear powers can and do make new
atomic parts.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&#39;m not really surprised that the weapons aren&#39;t being destroyed
because it&#39;s incredibly hard to do so in a meaningful fashion;
it&#39;s not like guns where you just melt them down or something.
However, seeing why requires an understanding the
physics of the situation, so let&#39;s start there.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Thanks to Wikipedia, which was indispensible in gathering the
background detail for all this. I also can&#39;t recommend enough Richard Rhodes&#39;s
&lt;a href=&quot;https://www.amazon.com/Making-Atomic-Bomb-Richard-Rhodes-ebook/dp/B008TRU7SQ&quot;&gt;The Making of the Atomic Bomb&lt;/a&gt;,
which provides a very clear account of the physics of nuclear weapons, as well
as the history of the Manhattan Project.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;backgrounder%3A-atoms%2C-elements%2C-and-isotopes&quot;&gt;Backgrounder: Atoms, Elements, and Isotopes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#backgrounder%3A-atoms%2C-elements%2C-and-isotopes&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;This section is elementary but important material on the
structure of matter. If you know what an &amp;quot;element&amp;quot; and an &amp;quot;isotope&amp;quot;
is, you can skip this.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Essentially all ordinary matter—the stuff you are made of and encounter
on a daily basis—is composed of atoms. An atom is composed of
three basic &lt;em&gt;subatomic&lt;/em&gt; (i.e., smaller than atoms) particles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Positively charged &lt;em&gt;protons&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Negatively charged &lt;em&gt;electrons&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Non-charged &lt;em&gt;neutrons&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At a super-simplified level, an atom is like a miniature solar system,
with a &lt;em&gt;nucleus&lt;/em&gt; at the center, consisting of protons and neutrons,
and the electrons orbiting around it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
Atoms have the same number of electrons and protons, which renders
them neutrally charged. An atom can also gain or lose an electron
to become an &lt;em&gt;ion&lt;/em&gt;, which is something we&#39;ll need to know later.&lt;/p&gt;
&lt;p&gt;The chemical properties of an atom are dictated by the number
of electrons, and because the number
of electrons is the same as the number of protons in the nucleus,
the number of protons also dictates those properties.
Every atom with a given number of protons in the nucleus
(the &lt;em&gt;atomic number&lt;/em&gt;) thus has the same chemical properties
(the technical term here is &lt;em&gt;element&lt;/em&gt;). Each element has
a name and a one or two letter symbol. For instance,
hydrogen&#39;s symbol is &amp;quot;H&amp;quot;, oxygen&#39;s is &amp;quot;O&amp;quot;, etc. There
are 100 or so elements, but of course many more chemicals
because you can combine elements in a lot of different ways.&lt;/p&gt;
&lt;p&gt;Finally, this brings us to neutrons. It&#39;s possible to have
different numbers of neutrons in the nucleus of an atom, even
with the same number of protons. For instance, you can have
three different flavors of hydrogen atoms:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Name&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Number of Neutrons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Hydrogen&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Deuterium&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Tritium&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Because the neutrons have no impact on the charge of the nucleus,
they also have no influence on the number of electrons, which
means that all three types of hydrogen have basically the same chemical
properties; they just have different masses.
The term for different flavors of the same element is
&lt;em&gt;isotope&lt;/em&gt;, as in &amp;quot;deuterium and tritium are two different
isotopes of hydrogen&amp;quot;. It&#39;s standard to refer to isotopes
by the total combined number of neutrons and protons in the
nucleus, so, for instance, deuterium is H-2 (H for hydrogen).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Many elements exist in multiple isotopes in nature, though in many
cases one isotope is common and the others are rare.&lt;/p&gt;
&lt;h2 id=&quot;brief-overview-of-the-physics-of-nuclear-weapons&quot;&gt;Brief Overview of the Physics of Nuclear Weapons &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#brief-overview-of-the-physics-of-nuclear-weapons&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I said above that chemical reactions don&#39;t create or destroy
atoms, but it&#39;s possible to have &lt;em&gt;nuclear&lt;/em&gt; reactions which do
exactly that. There are several such processes.&lt;/p&gt;
&lt;h3 id=&quot;atomic-decay&quot;&gt;Atomic Decay &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#atomic-decay&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Many atomic isotopes are &lt;em&gt;unstable&lt;/em&gt;, which means that they
will spontaneously &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Radioactive_decay&amp;amp;oldid=1117714984&quot;&gt;&lt;em&gt;decay&lt;/em&gt;&lt;/a&gt;
into other isotopes by emitting some other particle.
For instance, the element uranium-238 decays by emitting
an &lt;em&gt;alpha particle&lt;/em&gt; (another name for a helium nucleus,
containing two protons and two neutrons),
reducing the atomic number by two (the two protons)
and the atomic weight by four (the two protons plus
the two neutrons) and giving you the element thorium-234.
Thorium is itself unstable and decays by emitting a
&lt;em&gt;beta&lt;/em&gt; particle (another name for an electron, see &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#below&quot;&gt;radiation&lt;/a&gt;) to give you protactinium-234m.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Different isotopes decay at different rates. The standard
way to define this in terms of what&#39;s called a &amp;quot;half-life&amp;quot;,
which is to say the amount of time it takes half of the
atoms in a given sample of an isotope to decay (alternatively,
the time after which there is a 50% chance that a single
atom has decayed). Shorter half-lives mean that an isotope
is more radioactive (because there are more decays per second);
longer half-lives mean that they are more stable.
It&#39;s possible to have isotopes with very long half lives,
on the order of thousands of years. Note that atomic decay
is effectively a memory-less process, which is to say that
if you start from X units of an unstable isotope, it takes
the same amount of time to get from X to 1/2 X as it does
to get from 1/2 X to 1/4 X.&lt;/p&gt;
&lt;p&gt;In addition to releasing particles, this process releases energy,
in various forms, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Kinetic energy&lt;/em&gt; from the new atom and the emitted particle
moving faster than they were before. These particles then
interact with the surrounding material, producing heat.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#radiation&quot;&gt;Radiation&lt;/a&gt;&lt;/em&gt; in the form of x-rays, neutrons,
etc.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means that radioactive isotopes tend to be warm or even
hot. In fact, it&#39;s possible to exploit this effect to power devices
for long periods of time in what&#39;s called a
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Radioisotope_thermoelectric_generator&amp;amp;oldid=1122812386&quot;&gt;radioisotope thermal generator (RTG)&lt;/a&gt;.
RTGs are a common way to power spacecraft,
for the obvious reason that you can&#39;t easily get out there and
change the batteries.&lt;/p&gt;
&lt;p&gt;One thing to notice here is that this is a one-way process,
with unstable elements decaying to produce other
lighter elements and energy. Eventually, the process
terminates when some relatively stable isotope is
produced, at which point you have a stable system and
a bunch of heat: see also the
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Second_law_of_thermodynamics&amp;amp;oldid=1122962991&quot;&gt;second law of thermodynamics&lt;/a&gt;.
It&#39;s also possible to go from lighter to heavier products, as we&#39;ll
see below in the discussion of &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fusion&quot;&gt;fusion&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;radiation&quot;&gt;Radiation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#radiation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You&#39;ll often hear that various isotopes are &lt;em&gt;radioactive&lt;/em&gt;
and that they emit &lt;em&gt;radiation&lt;/em&gt;. In this context, radiation is more or less
the generic term for &amp;quot;stuff emitted by various kinds of atomic
processes that you probably don&#39;t want to come into contact with&amp;quot;.&lt;/p&gt;
&lt;p&gt;Unfortunately, the
names of various types of radiation are incredibly
confusing, dating from a time period where the physics
of nuclear energy was poorly understood. When some new
form of radiation was discovered physicists would tend
to give it a name that just reflected that it was
something new, hence &amp;quot;X-rays&amp;quot; (with the &amp;quot;X&amp;quot; indicating
unknown) and alpha, beta, and gamma radiation,
names (according to Wikipedia, based on the degree to
which they penetrated matter). Now, of course, we
understand the actual physics a lot better, but the
old names persist. As a practical
matter, you&#39;ll hear about the following:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Name&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;What it actually is&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Alpha&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Helium nuclei (two protons and to neutrons)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Beta&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Electrons&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Gamma&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;High energy photons (i.e., light, but outside the visible range)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;X-rays&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;High energy photons, but typically lower energy than Gamma&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Neutrons&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Neutrons&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These are all bad for you, but different levels of bad.
None of them will turn you into The Hulk.&lt;/p&gt;
&lt;h3 id=&quot;fission&quot;&gt;Fission &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fission&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Atoms can also undergo &lt;em&gt;fission&lt;/em&gt; in which the nucleus splits
into two smaller nuclei, some other particles such
as neutrons, x-rays, etc.
Most relevant to us are the following fission
reactions, which we&#39;ll discuss shortly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Uranium-235 can break up into (typically) krypton-92 and
barium-141&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Plutonium-239 can break up into (typically) zirconium-103
and xenon-134&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I say &amp;quot;typically&amp;quot; because fission is kind of a non-deterministic
process: the new nuclei need to have a mass that adds up to
the original mass (minus whatever other particles were emitted)
but there&#39;s some variation in which elements are produced.
The following figure shows the distribution of fission products
for some common fissile isotopes:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/fission-products.png&quot; alt=&quot;Fission products&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[From &lt;a href=&quot;https://www.nuclear-power.com/nuclear-power-plant/nuclear-fuel/plutonium/plutonium-239/&quot;&gt;nuclear-power.com&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;It&#39;s possible for atoms to spontaneously undergo fission
(more on this later), but more commonly it&#39;s the result
of external forces. Specifically, if a neutron impacts the
nucleus of an atom it can attach itself to the nucleus,
creating a new isotope that is one unit heavier. If this
isotope is unstable (as is reasonably likely, because
you&#39;re perturbing an isotope which is currently stable)
it can undergo fission.&lt;/p&gt;
&lt;h4 id=&quot;chain-reactions&quot;&gt;Chain Reactions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#chain-reactions&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Here&#39;s what we know so far:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When an atom undergoes fission, it can emit neutrons&lt;/li&gt;
&lt;li&gt;When a neutron hits an atom, it can cause it to undergo fission&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When you put these two facts together, you can have what&#39;s called
a &lt;em&gt;chain reaction&lt;/em&gt; in which one atom undergoes fission and produces
enough neutrons to cause two atoms to undergo fission; in turn
those atoms emit more neutrons, and we have an exponential growth
process which results in the rapid release of very large amounts
of energy, in other words, an atomic bomb.&lt;/p&gt;
&lt;p&gt;The figure below shows the process, also helpfully showing
that Uranium doesn&#39;t always decay into the same pieces.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/fission-chain-reaction.png&quot; alt=&quot;Nuclear fission chain reaction&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Image by MikeRun from &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Nuclear_fission_chain_reaction.svg&quot;&gt;Wikimedia&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;Note that it&#39;s not necessary for every neutron to impact a
nucleus in order to get a chain reaction as long as on average
the fission of one atom results in the fission of more than
one other atom. Nuclear reactors work by modulating the number
of neutrons that effectively impact other atoms, thus keeping
a stable reaction rate rather than one that is explosively
exponential. Describing how that works is outside the scope of
this post, however.&lt;/p&gt;
&lt;h3 id=&quot;fusion&quot;&gt;Fusion &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fusion&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s also possible for two light atoms to come together to form
one heavier atom, in a process called &lt;em&gt;fusion&lt;/em&gt;. The most relevant
case for us is that two &lt;em&gt;hydrogen&lt;/em&gt; atoms (atomic number 1)
can fuse to form one &lt;em&gt;helium&lt;/em&gt; atom (atomic number 2).
This is what happens in the sun, but can also be exploited to
build a much bigger bomb than a pure fission bomb. More on
this &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#thermonuclear-weapons&quot;&gt;later&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;making-an-atomic-bomb&quot;&gt;Making an Atomic Bomb &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#making-an-atomic-bomb&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Once you have the insight from the chain reaction, it&#39;s a pretty straight
shot to the idea of an atomic bomb, and physicist Leo Szilard famously &lt;a href=&quot;https://blogs.scientificamerican.com/the-curious-wavefunction/leo-szilard-a-traffic-light-and-a-slice-of-nuclear-history/&quot;&gt;invented it&lt;/a&gt; while
waiting at a traffic light:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;In London, where Southampton Row passes Russell Square, across from the British Museum in Bloomsbury, Leo Szilard waited irritably one gray Depression morning for the stoplight to change. A trace of rain had fallen during the night; Tuesday, September 12, 1933, dawned cool, humid and dull. Drizzling rain would begin again in early afternoon. When Szilard told the story later he never mentioned his destination that morning. He may have had none; he often walked to think. In any case another destination intervened. The stoplight changed to green. Szilard stepped off the curb. As he crossed the street time cracked open before him and he saw a way to the future, death into the world and all our woes, the shape of things to come&amp;quot;...&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;[Quote from Richard Rhodes&#39;s &amp;quot;Making of the Atomic Bomb&amp;quot;]&lt;/p&gt;
&lt;p&gt;It was almost 12 years from that moment when the first atomic bomb was
&lt;a href=&quot;https://en.wikipedia.org/wiki/Trinity_(nuclear_test)&quot;&gt;tested&lt;/a&gt; at
Alomogordo New Mexico. This test was the result of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Manhattan_Project&amp;amp;id=1122077595&amp;amp;wpFormIdentifier=titleform&quot;&gt;three years of
work&lt;/a&gt;
by over 100,000 people and an investment of over $23 billion in
current dollars, in the form of the US Manhattan Project.
This raises the question: if it&#39;s so straightforward, what took
so long?&lt;/p&gt;
&lt;p&gt;In order to get exponential growth you need to have
on average more than one neutron emitted from the first fission event to
create fission in some other atom. Otherwise, you get an exponential
&lt;em&gt;decay&lt;/em&gt; process where the chain reaction goes toward zero and nothing much happens.
If you just have a small number of atoms, then the most likely
thing is that a neutron will just be emitted outside your
fissile material and not contribute to the chain reaction.
You need a certain minimum amount of material in order
to get the probability of subsequent fission high enough that
you get exponential growth. This amount is called the
&lt;em&gt;critical mass&lt;/em&gt; and depends on the specific properties of the
element you are using, and in particular (1) how many neutrons
it emits when it undergoes fission and (2) how likely it is
that when a given neutron hits an atom it will result in a
new fission event. The critical mass also depends on the shape (geometry) of
the fissile material, with a sphere being the ideal shape because
it has the maximum volume to surface ratio, which minimizes
the chance that the neutrons will just be uselessly expelled
from the surface.&lt;/p&gt;
&lt;p&gt;OK, so we just need to collect enough material and presto,
we have a bomb. Unfortunately, it&#39;s not so simple:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Getting enough of the right material is hard.&lt;/li&gt;
&lt;li&gt;As soon as you start to assemble the material into
a critical mass, it starts reacting, and so if
you do it wrong, the energy emission will cause it
to explosively disassemble, which isn&#39;t fun if
you&#39;re nearby, but produces a much smaller
bang than you were looking for (a &amp;quot;fizzle&amp;quot;).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&#39;s look at each of these in turn.&lt;/p&gt;
&lt;h3 id=&quot;a-materials-problem&quot;&gt;A Materials Problem &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#a-materials-problem&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;First, we have the problem of the right material. It
quickly became apparent that there was only one suitable
natural element: uranium.&lt;/p&gt;
&lt;h4 id=&quot;uranium&quot;&gt;Uranium &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#uranium&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Recall that I said above that the uranium-235 nucleus
(U-235) can easily undergo fission. Fortunately for us, but
unfortunately for the purposes of making an atomic bomb, the 99% of
the uranium in the world is not uranium-235 but rather
uranium-238 (U-238), which does not readily undergo
fission when bombarded by neutrons (instead, it tends to form U-239, which eventually
decays but doesn&#39;t undergo fission, we&#39;ll want this information later).
This presents a problem because it means that most of the
neutrons emitted by U-235 fission don&#39;t lead to more
fission events and you don&#39;t get exponential growth, hence
no bomb. Or, more precisely, the critical mass of natural
uranium was improbably large, between 10 and 44 tons
(calculation by Rudolf Pierls, as cited by Rhodes). Not
something you could drop from a plane.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;However, what people eventually realized was that if you
had &lt;em&gt;just&lt;/em&gt; U-235, or even &lt;em&gt;mostly&lt;/em&gt; U-235, then it was possible
to sustain a fission explosion with much less mass (the
first uranium bomb, &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Little_Boy&amp;amp;oldid=1122486278&quot;&gt;Little Boy&lt;/a&gt;,
used 64kg of uranium). So, now the problem just becomes
&lt;em&gt;enriching&lt;/em&gt; the uranium so that you have a higher fraction
of U-235 than in natural uranium (Little Boy used 80% U-235).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
This is where things start to get hard.&lt;/p&gt;
&lt;p&gt;Traditionally,
there are two main ways to separate out a mixture of two substances:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Via chemical processes. For instance, this
&lt;a href=&quot;https://www.webassign.net/sample/ncsumeorgchem1/lab_3/manual.html&quot;&gt;lab experiment&lt;/a&gt;
describes how to separate out the components of a common headache
medicine into acetylsalicylic acid (aspirin), salicylamide, and caffeine,
by taking advantage of the fact that each component reacts differently
with different reagents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Via physical processes. For instance, given a mixture of alcohol
and water (e.g., wine) you can increase the alcohol concentration
in the mixture by heating it and collecting the vapor to produce
brandy; this takes advantage of the fact that alcohol has a lower
boiling point than water and therefore the vapor has more
alcohol than the original liquid.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unfortunately, because U-235 and U-238 are both
isotopes of uranium, they behave chemically identically, so
chemical processes are more or less
impractical.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
This leaves us with physical processes, but because the weight of
the respective molecules differs by only 1.2%, the physical behavioral
differences are very small as well, which makes any physical separation
process very inefficient.
Eventually, the physicists on the Manhattan Project settled on
two main approaches.&lt;/p&gt;
&lt;h5 id=&quot;gaseous-diffusion&quot;&gt;Gaseous Diffusion &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#gaseous-diffusion&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;In this approach, you create a gaseous form of
uranium hexafluoride and allowed it to slowly diffuse across
nickel barrier with very small perforations. Because the U-235
molecules are slightly lighter
than the U-238 molecules, they move across the membrane slightly
faster, with the result that if you stop partway the resulting
mixture on the far side has slightly more U-235 than the
starting mixture. Because this process is so inefficient,
you need multiple stages in which the output of one
stage is fed into another. To make matters worse, the
uranium hexafluoride is fiendishly reactive and toxic,
so very hard to work with. The result is difficult
industrial chemistry on a giant scale.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;multiple-lines-of-attack&quot;&gt;Multiple Lines of Attack &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#multiple-lines-of-attack&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;One thing that Rhodes does a great job of bringing out
is the extent to which the Manhattan Project involved
pursuing multiple lines of attack on the problem of
building an atomic bomb, with the hope at least some of
them would work. Some failed, of course, but at the end of the day, they
had two entirely different routes that succeeded,
with the result that the two bombs that were eventually
dropped used totally different technologies:
&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#gun-type-devices&quot;&gt;uranium &amp;quot;gun-type&amp;quot; devices&lt;/a&gt;
and &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#implosion-devices&quot;&gt;plutonium implosion devices&lt;/a&gt;.
Similarly, they purused three independent technologies
for uranium enrichment, of which two turn out
to be really useful.&lt;/p&gt;
&lt;/div&gt;
&lt;h5 id=&quot;electromagnetic-separation-(mass-spectrometry)&quot;&gt;Electromagnetic Separation (mass spectrometry) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#electromagnetic-separation-(mass-spectrometry)&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;The intuition
here is that if you ionize the uranium atoms so that they have
an electrical charge (I told you we&#39;d come back to ions) you
can then accelerate them with an electric field. If you then
apply a transverse (perpendicular) magnetic field, then the
ions will follow a curved trajectory, as shown below. Because the
U-235 ions are slightly lighter they will follow a slightly tighter
trajectory; you can then effectively set up a bucket and collect
them. Of course, it will be a very small bucket because you
are literally separating one atom at a time.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/calutron.png&quot; alt=&quot;Calutron uranium separation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[The original diagram for electromagnetic separation.]&lt;/p&gt;
&lt;p&gt;The scale of both of these processes was truly enormous. Rhodes
again:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The United States was critically short of copper, the best
common metal for winding the coils of electromagnets. For
recoverable use, the Treasury offered to make silver bullion
available in copper&#39;s stead. The Manhattan District put
the offer to the test, Nichols negotiating the loan with
Treasury Undersecretary Daniel Bell. &amp;quot;At one point
in the negotiations,&amp;quot; writes Groves, &amp;quot;Nichols ... said
that they would need between five and ten thousand
tons of silver. This led to the icy reply: &#39;Colonel,
in the Treasury we do not speak of tons of silver; our
unit is the Troy ounce.&#39;&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Manhattan project eventually ended up using both of
these processes, gaseous diffusion first, and then electromagnetic separation.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Even the more modern process involving high-speed centrifuges
involves a fairly significant investment. However, there is
a more easy way to get the fissile material you need to make
an atomic bomb.&lt;/p&gt;
&lt;h4 id=&quot;plutonium&quot;&gt;Plutonium &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#plutonium&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Uranium is the only &lt;em&gt;natural&lt;/em&gt; material suitable for making
a bomb, but element 94 (plutonium) works fine as well
well. Plutonium has two very convenient properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s relatively easy to make with nuclear reactors
because it&#39;s the result of U-238 reacting with
a neutron (see above). So all you need is
a nuclear reactor and some U-238 and you&#39;ve got
plutonium. In practice, reactors never run on
pure U-235, so they always produce plutonium,
even if it&#39;s treated as a waste product. Of course,
you can design your reactor to optimize the
production of plutonium.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because plutonium isn&#39;t just an isotope of uranium
it&#39;s relatively easy to chemically separate from
the U-238 it was created in. I say relatively
because both plutonium and uranium are
highly toxic and the whole mess is intensely radioactive,
but fundamentally it&#39;s just chemistry; no need for
gaseous diffusion or mass spectrometers. Plutonium
itself comes in several isotopes, but the isotope
you get the most of, Pu-239, is the one you want for making bombs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For these two reasons, modern atomic bombs generally
use plutonium rather than uranium.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;assembly&quot;&gt;Assembly &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#assembly&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Once you have your fissile material you need to assemble
it into a critical mass. This is a challenging process,
because, as noted above, once you start to bring the material
together it starts reacting even before the critical mass
is assembled. If you do it wrong, the energy emission will cause it
to explosively disassemble, but with a much smaller
bang than you were looking for (a &amp;quot;fizzle&amp;quot;).
In order to make a bomb, you need to bring the material
together very fast so that you get a lot of fission
before the critical mass disassembles itself (i.e.,
explodes). Even so, you typically only get a fairly
small proportion of the material reacting, but the reaction
is so energetic that you still get a big explosion.&lt;/p&gt;
&lt;h4 id=&quot;gun-type-devices&quot;&gt;Gun-Type Devices &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#gun-type-devices&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Uranium bombs are comparatively simple to build, using
what&#39;s called a &amp;quot;gun-type&amp;quot; assembly mechanism, as
shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/gun-type-weapon.png&quot; alt=&quot;Gun type bomb diagram&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Diagram by Dake, Papa Lima Whiskey, and Mfield from &lt;a href=&quot;https://en.wikipedia.org/wiki/File:Gun-type_fission_weapon_en-labels_thin_lines.svg&quot;&gt;Wikipedia &lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;This diagram shows a full weapon, but just focus on the
gray area in the center that represents the &amp;quot;physics package&amp;quot;,
i.e., the atomic bomb itself, not the stuff needed to
deliver it. Basically, a gun type bomb is what it sounds
like: you have a hollow &amp;quot;bullet&amp;quot; made of uranium and you
shoot it down a long barrel (originally literally
made from a cannon) at a cylindrical &amp;quot;target&amp;quot; also made
of uranium. When the cylinder contacts the target and
surrounds it the result is a critical mass, resulting
in an explosion. This all happens very quickly: you don&#39;t
even need to have something to stop the bullet because
the brief period when the target is passing through the
bullet is enough. And of course, once the reaction
starts, the whole thing will explosively dismantle
itself anyway.&lt;/p&gt;
&lt;h4 id=&quot;implosion-devices&quot;&gt;Implosion Devices &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#implosion-devices&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;You cannot, however, build a plutonium-based bomb using
a gun-type mechanism. Reactor-manufactured plutonium
is mostly Pu-239 but contains a small fraction of
Pu-240, which has a relatively high rate of spontaneous
fission. This rate is sufficiently high that as
the bullet and cylinder start to assemble a critical mass,
the reaction will start and the mass will prematurely
disassemble, with the result that you get &amp;quot;fizzle&amp;quot; rather
than a successful explosion.&lt;/p&gt;
&lt;p&gt;Instead, plutonium bombs are built using what&#39;s called an
implosion system, as shown in the diagram below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/implosion-weapon.png&quot; alt=&quot;Implosion bomb diagram&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Diagram by Ausis via &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Implosion_Nuclear_weapon.svg&quot;&gt;Wikipedia&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;In an implosion device you have a spherical core
(sometimes hollow and sometimes solid)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
called the &amp;quot;pit&amp;quot;.
It&#39;s surrounded by explosives which compress the
pit in a spherically symmetrical pattern, thus
forming a critical mass which holds together long
enough to produce an explosion.&lt;/p&gt;
&lt;p&gt;An implosion device is much less straightforward to build
than a gun-type device, in large part because it&#39;s hard
to get the explosives in the form of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Shaped_charge&amp;amp;oldid=1120095819&quot;&gt;shaped charges&lt;/a&gt; to actually symmetrically
compress the pit. As a comparison point, the world&#39;s
first nuclear explosion was a
&lt;a href=&quot;https://en.wikipedia.org/wiki/Trinity_(nuclear_test)&quot;&gt;test&lt;/a&gt;
of an implosion-type bomb. The physicists at the Manhattan
Project were so confident that the gun-type bomb would work
that the first one ever detonated was the bomb dropped on
Hiroshima, without any live testing at all.&lt;/p&gt;
&lt;p&gt;Once you know how to do it, however, plutonium
is much more convenient as a material to use for weapons
because, as noted above, it&#39;s so much easier to obtain.
Moreover, at this point it&#39;s fairly well understood how to build
implosion devices, to the point where non-experts have
famously &lt;a href=&quot;https://www.theguardian.com/world/2003/jun/24/usa.science&quot;&gt;designed&lt;/a&gt;
plausible weapons without recourse to classified information.
And of course, at this point 9 total countries have
successfully built nuclear weapons (the US, Russia, the UK,
France, China, India, Pakistan, North Korea, and Israel).
In other words, the really hard part of building
a nuclear weapon is getting the plutonium in the first
place.&lt;/p&gt;
&lt;h3 id=&quot;thermonuclear-weapons&quot;&gt;Thermonuclear Weapons &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#thermonuclear-weapons&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Everything I&#39;ve written so far is about fission type weapons,
which are the original atomic bombs. However, modern weapons
are frequently what&#39;s called &amp;quot;thermonuclear&amp;quot; devices which
are based on both nuclear fission and nuclear &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fusion&quot;&gt;fusion&lt;/a&gt;
(aka &amp;quot;hydrogen bombs&amp;quot;).
The details are of course complicated, but briefly,
fusion takes place under conditions of very high heat
and so you use a fission explosion (the &amp;quot;primary&amp;quot;)
to initiate the fusion reaction (the &amp;quot;secondary&amp;quot;).
For reasons that are out of scope of this post, fusion
bombs can be made much more powerful than fission-only
bombs.&lt;/p&gt;
&lt;p&gt;They&#39;re also substantially more complicated to design,
because, like implosion devices, you have to ensure that they
have time to fuse before they disassemble themselves.
Wikipedia has a good &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Thermonuclear_weapon&amp;amp;oldid=1123622590&quot;&gt;primer&lt;/a&gt;
on the design of thermonuclear devices. Richard Rhodes&#39;s
&lt;a href=&quot;https://www.amazon.com/Dark-Sun-Making-Hydrogen-Bomb/dp/0684824140&quot;&gt;Dark Sun&lt;/a&gt;
contains a much more in-depth treatment of the history
and design of thermonuclear weapons.&lt;/p&gt;
&lt;h2 id=&quot;disposal&quot;&gt;Disposal &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#disposal&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;After 4500+ words, we&#39;re finally ready to address the question
we started with, which is to say, how one disposes of
unwanted nuclear weapons. As described in the aforementioned
&lt;a href=&quot;https://www.nytimes.com/2022/11/17/science/retired-nuclear-bombs-b83.html?searchResultPosition=1&quot;&gt;NYT article&lt;/a&gt;,
the current practice in the US is mostly to disassemble them
and to store the parts, making it possible to reassemble
them later into similar weapons. The article leans kind
of heavily on the fact that this is surprising (true!) but
does eventually list three reasons why it might not
be a good idea:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The parts themselves (principally the pits) are a safety
hazard.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There are &amp;quot;security&amp;quot; issues, presumably that someone might
steal the parts and make their own weapon.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;That this doesn&#39;t really put them beyond use and so
isn&#39;t a real reduction in the number of weapons
because the US could readily make new weapons if it chose to.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are two primary assets that we might need to concern
ourselves:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The plutonium pit itself&lt;/li&gt;
&lt;li&gt;The rest of the weapon&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The situation with the rest of the weapon is simpler so let&#39;s
look at that first.&lt;/p&gt;
&lt;h3 id=&quot;the-rest-of-the-weapon&quot;&gt;The Rest of the Weapon &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#the-rest-of-the-weapon&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The parts of the weapon other than the pit give you a head
start on building a new weapon in two ways. First,
if you just disassemble the weapon into pieces
then it&#39;s (presumably) comparatively straightforward
to reassemble them back into a functional weapon. You might
also be able to reassemble them into a similar weapon though
based on what I know, you would want it to be reasonably
similar to the original. In either case, this is almost certainly
easier than manufacturing all new parts and the necessary
associated supply chain.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nuk-instructions.png&quot; alt=&quot;IKEA instructions for building a bomb&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[A lightly modified version of Midjourney&#39;s output
for &amp;quot;ikea instructions for assembling a nuclear weapon, diagram, black and white, detailed, realistic --v 4&amp;quot;]&lt;/p&gt;
&lt;p&gt;Second, the parts embody the knowledge about how to build
a new weapon. As noted above, while it&#39;s helpful to have this
for building a fission device, at this point this is something
that can be reproduced fairly readily. However, thermonuclear
bombs are significantly more complicated to design and
quite easy to get wrong, so it would definitely be helpful
to have a reference design to start from. The fusion component
also seems to involve some isotopes of hydrogen (tritium
and deuterium), so it would be modestly helpful to have that
but my understanding is that it&#39;s not &lt;em&gt;that&lt;/em&gt; hard to get
your hands on these isotopes. Deuterium in the form of
&amp;quot;heavy water&amp;quot; (i.e., heavy hydrogen and oxygen)
is &lt;a href=&quot;https://www.sigmaaldrich.com/US/en/product/aldrich/617385&quot;&gt;readily available&lt;/a&gt;
from chemical supply houses. So, while the article says
&amp;quot;the nuclear warhead is the bullet-like cylinder at the back. It holds the plutonium pit and the hydrogen fuel, which gives the bomb its vast powers of destruction&amp;quot;, my sense is that the
hydrogen fuel part is pretty easy to obtain.&lt;/p&gt;
&lt;p&gt;But of course none of this is very useful if you don&#39;t have
the pit, which is necessary to start the whole thing off.
It&#39;s also fairly straightforward to destroy
these components, as they&#39;re fundamentally just hardware.
Not so, for the pit.&lt;/p&gt;
&lt;h3 id=&quot;the-pit&quot;&gt;The Pit &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#the-pit&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The pit presents two problems. First, even without the rest of the components, the plutonium
pits can be reused to make new weapons, either with
a similar geometry to the current weapon, or melted
down and formed into the pit of a new weapon with
a new geometry. We know from experience that once state-level
actors get access to enough plutonium to build a bomb they
generally succeed. Of course, non-state-level actors might
have a much harder time building a bomb from raw plutonium.&lt;/p&gt;
&lt;p&gt;Second, it&#39;s extremely difficult to destroy plutonium effectively
(some weapons are built out of highly enriched uranium and
that can just be diluted in U-238 and used for reactors).
Obviously, you can melt it down, but that just leaves you with
a chunk of subcritical plutonium which someone can re-form into
a new weapon. The plutonium is highly toxic, so you can&#39;t
just grind it up and scatter it around without causing huge
environmental impacts (watch &lt;a href=&quot;https://www.hbo.com/chernobyl&quot;&gt;Chernobyl&lt;/a&gt;
if you want to get a sense of what I&#39;m talking about here).
You can&#39;t burn it because then you&#39;re going to have
oxidized plutonium in the air, which you don&#39;t want
people inhaling, and while you can of course
use chemicals to dissolve it, vitrify it, etc. you&#39;re still
left with an equivalent amount of plutonium, just bonded
to some other stuff, and so it&#39;s just a matter of (potentially
highly unpleasant) chemistry to get it back out again. In
other words, it&#39;s precisely the properties of plutonium that
make it attractive to build nuclear weapons out of that make
it so hard to dispose of.&lt;/p&gt;
&lt;p&gt;It&#39;s also very difficult to store because while
an individual weapon may not be a critical mass, if you have
tens or hundreds of weapons you have to worry about them getting
close enough to worry about accidentally assembling a critical
mass just from proximity, which, would of course, be bad.&lt;/p&gt;
&lt;p&gt;Of course, this isn&#39;t news to policymakers. As the NYT article
says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Clinton, Bush and Obama administrations all made plans — with costs in the billions of dollars — to get rid of excess plutonium stocks, which grew rapidly after the Cold War because of arms disassembly. But no strategy has so far succeeded.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The best available option &lt;a href=&quot;https://rlg.fas.org/s245nato.htm&quot;&gt;appears to be&lt;/a&gt;
seems to be to turn the plutonium into what&#39;s called &amp;quot;mixed-oxide fuel&amp;quot; (MOX) and
then using it to fuel nuclear reactors. Unfortunately, this doesn&#39;t
work super well for a number of logistical reasons, for instance
that many reactors can only use MOX for some of their fuel;
and of course we have an unbelievable amount of plutonium
lying around, not just from existing nuclear weapons but also
from the operations of normal nuclear reactors, which, as
noted above, create plutonium. The FAS report I linked above
is from 1993 and states that &amp;quot;There is almost 1000 MT of reactor Pu (R-Pu) in existence now, with the amount growing by about 100 MT per year.&amp;quot; (disposal of plutonium waste is one of the
big problems with nuclear reactors).
So, the situation is really quite difficult
even if we ignore disassembled weapons, which actually tend
not to be that big (recall that the pit weighs on the order
of a few kg).&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I don&#39;t want to spend too much time playing media critic here,
but I don&#39;t feel like this article did that great a job of
putting things in context. The implication of this article is that
the US isn&#39;t really serious about disarmament and so it&#39;s storing
all the nukes in pieces but not really destroying them in order to
have ready access later, and that
this creates all sorts of hazards. I&#39;m sure
that&#39;s true to some extent, but I think it&#39;s also necessary to
realize that actually destroying them is a lot harder than it
sounds and even if you were to do about the best we know how to do
and totally destroy all of the hardware
other than the pits, you&#39;d still be left with a large amount
of fantastically dangerous stuff which has to be guarded
for the next 100,000 years or so.
The critique that this material isn&#39;t being guarded does seem
like a reasonable one, but it seems like guarding it better
is the solution that we&#39;re left with.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In reality this whole orbiting thing is kind of nonsense
because actually they occupy this probability space of locations,
but we don&#39;t need to get into quantum mechanics here and
for our purposes we can just live with a classical-type
picture. &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technically, this is called the &lt;em&gt;atomic mass&lt;/em&gt;.
Protons and neutrons have approximately the same mass,
but electrons are much lighter, so the mass of the
atom is basically the mass of the protons and neutrons. &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This &amp;quot;m&amp;quot; isn&#39;t an error.
I didn&#39;t know about this, but apparently this is actually
a higher energy state of protractinium-234, which decays
more quickly. Thanks, Wikipedia! &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Famously, the
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Oklo_Mine&amp;amp;oldid=1124188262&quot;&gt;Oklo Mine&lt;/a&gt;
had a self-sustaining reaction in natural uranium, though
with the help of water as a &amp;quot;moderator&amp;quot; (out of scope again, I&#39;m afraid). &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The uranium with lower than normal U-235 is known
as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Depleted_uranium&amp;amp;oldid=1120739694&quot;&gt;depleted uranium&lt;/a&gt;
and is used in various military applications because
it is very dense. &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There actually is now
a chemical process called &lt;a href=&quot;https://inis.iaea.org/search/search.aspx?orig_q=RN:22063379&quot;&gt;Chemex&lt;/a&gt;
that takes advantage of some slight differences in chemical properties due to the change
in atomic mass. &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There were actually three separate processes, with
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=S-50_(Manhattan_Project)&amp;amp;oldid=1117949476&quot;&gt;thermal diffusion&lt;/a&gt;
being used to make slightly enriched uranium which
was then enriched much more with gaseous diffusion.
Thermal diffusion isn&#39;t very efficient and was eventually
abandoned. &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that you still need the ability to enrich uranium
to reactor grade levels so that you can run the reactor to
make the plutonium. &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The original pits were hollow, but as I understand it
more modern designs just use a solid pit and rely
on the explosives to compress the plutonium enough
to make a subcritical mass critical. &lt;a href=&quot;https://educatedguesswork.org/posts/nuclear-weapon-disposal/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Can we agree on the facts about QWACs?</title>
		<link href="https://educatedguesswork.org/posts/eidas-article45/"/>
		<updated>2022-11-25T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/eidas-article45/</id>
		<content type="html">&lt;p&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; Like the rest of the material on EG, these
are my opinions and not those of my employer.&lt;/p&gt;
&lt;p&gt;Over at the &lt;a href=&quot;https://blog.mozilla.org/netpolicy/files/2021/11/eIDAS-Position-paper-Mozilla-.pdf&quot;&gt;day job&lt;/a&gt; I&#39;ve been spending quite a bit of time
&lt;a href=&quot;https://twitter.com/RiskAhead/status/1591056272472047623/photo/1&quot;&gt;dealing with&lt;/a&gt; the proposed &lt;a href=&quot;https://digital-strategy.ec.europa.eu/en/library/trusted-and-secure-european-e-id-regulation&quot;&gt;eIDAS Article 45.2&lt;/a&gt;, which
would require browsers to accept *Qualified Website Authentication Certificates (QWACS)
issued by certificate authorities approved by European Union member states.
A lot of the discussion here
has either been in private or by &lt;a href=&quot;https://securityriskahead.eu/&quot;&gt;press&lt;/a&gt;
&lt;a href=&quot;https://www.linkedin.com/posts/european-signature-dialog_mozilla-campaign-pushes-serious-misinformation-activity-6978078620279824384-ByAc/&quot;&gt;release&lt;/a&gt;, neither
of which is very helpful in understanding the issues at play here.
I&#39;m a strong believer that we should be able to agree on facts,
even if we can&#39;t agree on the best way forward, so in that
spirit, this post attempts to lay out the technical situation.&lt;/p&gt;
&lt;p&gt;Apologies in advance that some of this material is a bit basic
and repetitive, but I wanted to have something self-contained.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-https-and-the-webpki&quot;&gt;Background: HTTPS and the WebPKI &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#background%3A-https-and-the-webpki&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In order to have a secure connection to a Web site via HTTPS (e.g.,
&lt;code&gt;https://educatedguesswork.org&lt;/code&gt;) it is necessary to both &lt;em&gt;encrypt&lt;/em&gt; the
traffic and &lt;em&gt;authenticate&lt;/em&gt; the site. The encryption happens via TLS,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
but TLS depends on the server having a public key which is
authenticated via a
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Public_key_certificate&amp;amp;oldid=1121995809&quot;&gt;certificate&lt;/a&gt;.
The certificate in turns binds that key to the server&#39;s identity. The server uses the
private key associated with that public to complete the TLS handshake,
thus proving that it is the correct owner of that identity. Without
the certificate, your browser could just be forming a secure
connection to an attacker.&lt;/p&gt;
&lt;p&gt;These certificates are issued by &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Certificate_authority&amp;amp;oldid=1117367270&quot;&gt;certificate authorities (CAs)&lt;/a&gt; (also, &amp;quot;certification authorities&amp;quot;), who
are responsible for validating the server&#39;s identity, issuing
the certificate, and revoking it if something goes wrong
(e.g., the server&#39;s key is compromised). But of course, we
can&#39;t have just anyone stand up a CA: because a CA is responsible
for attesting to server identities, a malicious (or just badly
operated) CA could &lt;em&gt;misissue&lt;/em&gt; certificates (i.e., issue them to
the wrong people), allowing attackers to impersonate
servers to clients and steal whatever information the user
is sending to the server. It&#39;s important to realize that every
CA is trusted to attest to &lt;em&gt;any&lt;/em&gt; server&#39;s identity, and so
the entire system depends on all the CAs behaving correctly.&lt;/p&gt;
&lt;p&gt;The way things work in practice is that the client has a list of
CAs that it trusts to issue certificates: if a certificate isn&#39;t
approved by one of those CAs,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
then the client will reject it. For instance, here&#39;s what happens
when Firefox encounters a &lt;a href=&quot;https://untrusted-root.badssl.com/&quot;&gt;certificate from an unknown CA&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/unknown-ca.png&quot; alt=&quot;Unknown certificate warning&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In principle it&#39;s possible for the user to ignore this warning, but in
practice it&#39;s a really bad idea and browsers have gotten increasingly
aggressive about discouraging users from doing so. As a practical
matter, you can&#39;t really
run a secure Web site without a valid certificate, by which I mean
one which is issued by a CA that is trusted by every major browser.&lt;/p&gt;
&lt;p&gt;There isn&#39;t just one list of valid CAs: Each of major browser
vendors has their own &amp;quot;root program&amp;quot;, in which they evaluate CAs and
determine which they trust (&lt;a href=&quot;https://www.mozilla.org/en-US/about/governance/policies/security-group/certs/policy/&quot;&gt;Mozilla&lt;/a&gt;, &lt;a href=&quot;https://www.chromium.org/Home/chromium-security/root-ca-policy/&quot;&gt;Chrome&lt;/a&gt;,
&lt;a href=&quot;https://www.apple.com/certificateauthority/ca_program.html&quot;&gt;Apple&lt;/a&gt;,
&lt;a href=&quot;https://learn.microsoft.com/en-us/security/trusted-root/program-requirements&quot;&gt;Microsoft&lt;/a&gt;).
As a practical matter, a CA needs to be accepted by all four of these
programs in order to issue certificates; otherwise its certificates
won&#39;t be accepted by a major browser which is pretty bad news.
Unsurprisingly, then, there is a fair amount of coordination between
the root programs. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&quot;https://cabforum.org/&quot;&gt;CA/Browser Forum (CABF)&lt;/a&gt; sets a common floor
of requirements (the &lt;a href=&quot;https://cabforum.org/about-the-baseline-requirements/&quot;&gt;Baseline Requirements&lt;/a&gt;)
that all CAs have to conform to.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&quot;https://www.ccadb.org/&quot;&gt;Common CA Database (CCADB)&lt;/a&gt; maintains
a common set of records for CAs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And, of course, the root program operators talk to each other
informally, especially in cases where some CA appears to be
misbehaving and it is necessary to determine how best to handle
it. For instance, due to a large set of
&lt;a href=&quot;https://wiki.mozilla.org/CA/Symantec_Issues&quot;&gt;issues&lt;/a&gt; with the
Symantec CA, the root programs worked together between 2016 and 2018
to gradually distrust Symantec.&lt;/p&gt;
&lt;h3 id=&quot;the-server&#39;s-identity&quot;&gt;The server&#39;s identity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#the-server&#39;s-identity&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I&#39;ve said that the certificate contains the server&#39;s identity, but not
what that identity consists of. The most common scenario is that
certificate just contains the &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/&quot;&gt;domain name&lt;/a&gt; of the server. So, the
certificate for &lt;code&gt;https://educatedguesswork.org&lt;/code&gt; would contain the
name &lt;code&gt;educatedguesswork.org&lt;/code&gt;. When the browser connects to the site,
it verifies that the domain name in the certificate matches the
domain name it is trying to connect to. In practice, certificates
often contain other information, such as the organization to which it
was issued, but &lt;strong&gt;the browser does not use it to establish the connection&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;As an example, here&#39;s the &amp;quot;subject&amp;quot; information from Twitter&#39;s
certificate:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Field&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Common Name&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;a href=&quot;http://twitter.com/&quot;&gt;twitter.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Organization&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Twitter, Inc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Location&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;San Francisco&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;State&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;California&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Country&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;US&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;distinguished-names-and-subjectaltname&quot;&gt;Distinguished Names and SubjectAltName &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#distinguished-names-and-subjectaltname&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The reason for this goofy name structure is that when certificates
were originally designed, the idea was that they would identify
people, not computers, and that everyone would have a distinct
name (technical term: &amp;quot;distinguished name&amp;quot;) and so this geographic and organization
information was useful to distinguish people with the same personal name.
When this structure was adapted for use with SSL/TLS, the
&amp;quot;Common Name&amp;quot; field was repurposed to contain the domain name.
In modern certificates, another field, &lt;em&gt;Subject Alternative Name (SAN)&lt;/em&gt;
is preferred. The SAN field can contain an arbitrary number of names. For
instance, Twitter&#39;s cert contains &lt;code&gt;twitter.com&lt;/code&gt; and &lt;code&gt;www.twitter.com&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The only part of this that the browser cares about is the &amp;quot;Common Name&amp;quot;
field, which contains Twitter&#39;s domain name. It just ignores the
rest of the fields, and you actually have to work a bit to see them
at all: in Firefox you can get to them from the &amp;quot;lock&amp;quot; icon, but in
Chrome you have to go into the developer tools.&lt;/p&gt;
&lt;p&gt;Not only is the domain name the only thing that matters, but clients
will accept &lt;em&gt;any&lt;/em&gt; certificate with that domain name. This is not
a design defect but rather a critical element of building an operational
system. For instance,
it&#39;s very common to operate a site using multiple servers and
use some load balancing mechanism to direct clients to specific
servers (this is the only realistic way to scale to very large
numbers of users). If you have a lot of these machines, then
there are significant operational challenges in maintaining them,
and it&#39;s common to have more than one certificate (e.g., one
certificate per machine or per data center). These machines might
even be operated by different entities, for instance you could
contract with multiple content distribution networks.
From the user&#39;s perspective these are all one service and you want things to operate
smoothly, which means that the user doesn&#39;t notice if one
Web request goes to server &lt;strong&gt;A&lt;/strong&gt; and one to server &lt;strong&gt;B&lt;/strong&gt;.
This requires the browser to treat multiple certificates as if they
were the same site: all that matters is the domain name
(see my &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin&quot;&gt;post&lt;/a&gt; for more on this
concept).&lt;/p&gt;
&lt;h3 id=&quot;domain-validation&quot;&gt;Domain Validation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#domain-validation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because the only thing that matters is the domain name, that&#39;s all
that most CAs check. Moreover, it&#39;s very difficult (i.e., expensive)
to verify that a specific person is entitled to use a specific domain
name, so instead what CAs do is check that you have &lt;em&gt;control&lt;/em&gt; of the
domain. This is called a &lt;em&gt;Domain Validation (DV)&lt;/em&gt; certificate.
The most common thing to do is shown in the figure below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/DomainValidation.png&quot; alt=&quot;Domain Validation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The way that this works is that the operator connects to the CA
and asserts that they control a given domain. The operator then
asks them to prove that they control it by placing a random
&lt;em&gt;challenge&lt;/em&gt; somewhere on their Web site. The operator then
goes to the site directly and checks that the file exists The
reasoning here is that because the challenge is under the CA&#39;s
control and is random, then the only way it could get onto
the site is if the operator put it there.
One nice feature of this design is that it is easy to implement
for the CA &lt;em&gt;and&lt;/em&gt; even more importantly for the site operator,
who presumably controls what goes on the site. It is also
easily automated, with protocols such as
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8555.html&quot;&gt;ACME&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;domain-control-and-web-site-structure&quot;&gt;Domain Control and Web Site Structure &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#domain-control-and-web-site-structure&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Note that I haven&#39;t said where the file should be. Some Web
sites allow unprivileged users to create files on the site
(e.g., your pictures on Instagram). If the CA allowed the
user to put the file anywhere, then it would be possible to
attack such sites. The verification protocol
needs to be designed to use a location that essentially no site
uses for user-controlled content. One possibility
(used by the &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8555.html&quot;&gt;ACME protocol&lt;/a&gt;
is to use the &lt;code&gt;/.well-known&lt;/code&gt; path which is supposed to be
only available to site operators.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you&#39;re thinking that this design is weirdly circular, you&#39;re right:
the purpose of HTTPS is to protect you against an attacker who
controls the network, but this type of domain verification is completely at
the mercy of an attacker who controls the network. And in fact, there have been
attacks based on control of the network, specifically by
&lt;a href=&quot;https://www.princeton.edu/~pmittal/publications/bgp-tls-usenix18.pdf&quot;&gt;controlling the BGP routing protocol&lt;/a&gt;
to deliver traffic to the attacker&#39;s server. The main
countermeasure to this is for the CA to verify the challenge
from multiple locations on the network (a technique
called &lt;a href=&quot;https://letsencrypt.org/2020/02/19/multi-perspective-validation.html&quot;&gt;multi-perspective validation&lt;/a&gt;),
which works because it&#39;s harder to hijack BGP across the entire
Internet than just against one location (and of course
the CA&#39;s network is probably better secured than
your average Starbucks network). In addition,
because certificates are recorded in &lt;a href=&quot;https://certificate.transparency.dev/&quot;&gt;Certificate Transparency&lt;/a&gt;
logs, it is possible to detect misissuance and
revoke the certificates or even distrust the CA if
necessary.
There are other designs for domain validation (e.g., using the
DNS), but they aren&#39;t really much more secure unless
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane&quot;&gt;DNSSEC&lt;/a&gt; is used.&lt;/p&gt;
&lt;p&gt;In any case, DV certificates are by far the most common
type of certificate on the Internet, because they are cheap
to issue, and, as mentioned before, work just fine.
They are so cheap to issue, in fact, that the &lt;a href=&quot;https://letsencrypt.org/&quot;&gt;Let&#39;s Encrypt&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Certificate Authority gives them away for free.&lt;/p&gt;
&lt;h3 id=&quot;extended-validation&quot;&gt;Extended Validation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#extended-validation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because DV certificates just validate the domain name, they don&#39;t
actually tell you what organization you are talking to.
From the perspective of the Web browser, this
is just fine, because its job is to ensure that the site you
are going to matches up with the link you clicked on or the
site name you typed in, but from the user&#39;s perspective it&#39;s
less than ideal. There are two basic problems here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s not necessarily obvious which real-world organization
is associated with a given domain name. For example,
the official site of the United States White House is
&lt;code&gt;whitehouse.gov&lt;/code&gt; but &lt;code&gt;whitehouse.com&lt;/code&gt; is a porn site.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Even if you do know what domain to expect, humans are
notoriously bad at comparing two strings. For instance,
&lt;code&gt;educatedguesswork.org&lt;/code&gt; is this site, but
would you really notice if you went to &lt;code&gt;educated-guesswork.org&lt;/code&gt;?
Similarly, &lt;code&gt;microsoft.com&lt;/code&gt; and &lt;code&gt;micros0ft.com&lt;/code&gt; are different
sites.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The result of these weaknesses is that users are susceptible
to &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Phishing&amp;amp;oldid=1122613297&quot;&gt;&amp;quot;phishing&amp;quot;&lt;/a&gt;
attacks, in which an attacker sends you a message
(e-mail, SMS, etc.), allegedly from your bank, PayPal,
etc. asking you to log in and do something, but with a link
to their site that has a similar name to the entity they
are impersonating. Then when you log in and enter your password,
they can steal it and log on to your account on the real site.&lt;/p&gt;
&lt;p&gt;In response to phishing attacks and concerns about the
general weakness of domain validations, a new kind of certificate
called an &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Extended_Validation_Certificate&amp;amp;oldid=1117387665&quot;&gt;&lt;em&gt;Extended Validation (EV)&lt;/em&gt;&lt;/a&gt;
certificate was created in 2007&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Unlike with DV certificates, before
issuing an EV certificate the CA validates the actual
organizational name of the applicant, e.g., by checking
business records. That name goes into the certificate
and then can be displayed to the user, for instance like
this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ev.png&quot; alt=&quot;Extended Validation UI&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Original image from &lt;a href=&quot;https://www.bleepingcomputer.com/news/software/chrome-and-firefox-changes-spark-the-end-of-ev-certificates/&quot;&gt;Bleeping Computer&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;The idea here is that the user knows they want to go to
(say) Stripe, and so they check for &amp;quot;Stripe&amp;quot; in the URL bar.&lt;/p&gt;
&lt;p&gt;EV certificates were one of those plausible ideas that were
worth a try but turn out not to work, for two distinct reasons.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Users Don&#39;t Check.&lt;/strong&gt; The basic premise of EV is that users will look at the UI
and behave differently when the EV indicator (the company
name) is displayed. Unfortunately, this seems not to be the
case. Chrome&#39;s Security team does a good job of
&lt;a href=&quot;https://chromium.googlesource.com/chromium/src/+/HEAD/docs/security/ev-to-page-info.md&quot;&gt;summarizing the research&lt;/a&gt;
in this area, but the TL;DR is that if you remove
the EV indicator for sites, most people don&#39;t seem to
notice or behave differently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Names Aren&#39;t Unique.&lt;/strong&gt; Organizational names are generally scoped
by jurisdiction, which allows an attacker to register a company
with the same name as the company they are impersonating and then
get an EV certificate. In one famous &lt;a href=&quot;https://arstechnica.com/information-technology/2017/12/nope-this-isnt-the-https-validated-stripe-website-you-think-it-is/&quot;&gt;incident&lt;/a&gt;, security researcher
Ian Carroll got an EV certificate for &amp;quot;Stripe Inc.&amp;quot;
by registering a legal entity in a different state
and then applying for an EV cert.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Browser vendors don&#39;t like unnecessary UI clutter, especially in
the area of security, and between 2018 and 2019, browsers &lt;a href=&quot;https://duo.com/decipher/chrome-and-firefox-removing-ev-certificate-indicators&quot;&gt;removed&lt;/a&gt;
the EV indicators in the main UI. This of course dramatically reduces the
incentive that sites have to get EV certificates because users have
to go to a lot of trouble to find out that a certificate is EV,
which it seems very likely they won&#39;t do. Understandably, this
&lt;a href=&quot;https://sectigo.com/resource-library/mozillas-announced-decision-to-remove-the-extended-validation-ui-indicator-should-be-reconsidered&quot;&gt;didn&#39;t make the CAs very happy&lt;/a&gt;, especially because EV certificates are
quite a bit more expensive than DV certificates, which
can be obtained for free from Let&#39;s Encrypt.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
By contrast, EV certificates can cost upward of &lt;a href=&quot;https://comodosslstore.com/comodo-ev-ssl.aspx&quot;&gt;100/year&lt;/a&gt;.
At present, only a very small percentage (well less than
1%) of the certificates in use on the Web are EV.&lt;/p&gt;
&lt;h3 id=&quot;arguments-for-ev-security&quot;&gt;Arguments for EV Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#arguments-for-ev-security&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I want to briefly address two arguments you will sometimes hear
for why EV certificates are more secure than DV. I don&#39;t think
either of these really hold up.&lt;/p&gt;
&lt;h4 id=&quot;phishing-is-mostly-dv&quot;&gt;Phishing is Mostly DV &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#phishing-is-mostly-dv&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Back in 2018, researchers from Entrust Datacard and Comodo published an
&lt;a href=&quot;https://pkic.org/uploads/2018/06/Summary-Report-Incidence-of-Phishing-04-16-2018.pdf/&quot;&gt;analysis&lt;/a&gt;
of the certificates used for phishing sites. They report that
the vast majority of sites used for phishing are DV
(unsurprising because most certificates are DV) but also
that a lower fraction of EV certs are used for phishing
than of DV certs:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Percent of Phishing Sites&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Overall Percent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;EV&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;.05&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;OV&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;.13&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;DV&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;99.82&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;94.3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The authors conclude that &amp;quot;EV sites are safer than OV and DV&amp;quot;, which
is likely true, but this shouldn&#39;t lead you to conclude that EV
prevents phishing. Phishers need to
register a lot of domains and have an incentive to use the cheapest
certificates they can get. Because DV certificates are cheap (free)
and work fine they naturally use them. If response rates for EV
were much better than DV, however, we would expect to see more
use of EV for phishing. In other words, yes, EV sites are less
likely to be phishing sites, but because users largely don&#39;t
notice the EV indicators (note that this research was published
before they were removed, so this is not an argument for their
reinstatement), then we shouldn&#39;t conclude that EV
actually reduces phishing.&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that just getting an EV certificate
doesn&#39;t reduce phishing at all. What you need is for users to
know that you have an EV certificate and refuse to go to sites
they think are yours if they don&#39;t have an EV cert. That&#39;s the
part that&#39;s breaking down here.&lt;/p&gt;
&lt;h4 id=&quot;dv-misissuance&quot;&gt;DV Misissuance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#dv-misissuance&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The other argument I sometimes hear is that because EV certificates
have a more stringent issuance process it&#39;s harder to get a fake
one for a domain you don&#39;t control. This is no doubt true, but unfortunately
it doesn&#39;t meaningfully increase security as long as DV certificates
still exist. The reason for this is, as I mentioned above, that
the browser will accept &lt;em&gt;any&lt;/em&gt; certificate with a domain name in it
as valid for a given site, so if an attacker can get a misissued
DV certificate for &lt;code&gt;example.com&lt;/code&gt; then they can impersonate &lt;code&gt;example.com&lt;/code&gt;
(including stealing passwords, cookies, etc.) even if &lt;code&gt;example.com&lt;/code&gt; has
an EV certificate.
Even worse, they can most likely do so while preserving the EV indicator.&lt;/p&gt;
&lt;p&gt;Consider a simple Web page which consists of one HTML file and one JavaScript
file. The way this page loads is shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/qwacs-page-with-js.png&quot; alt=&quot;Loading a Web page with JS&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The client first loads the HTML page, which contains a reference to the
JavaScript, and the client then contacts the server again to load
the JS.&lt;/p&gt;
&lt;p&gt;Now consider what happens when you have an attacker with a valid DV
certificate, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/qwacs-page-with-js-attack.png&quot; alt=&quot;Loading a Web page with JS from an attacker&quot; /&gt;&lt;/p&gt;
&lt;p&gt;They allow the client to contact the real server, which
authenticates with the EV certificate. Then when the client
goes to load the JS from the server, the attacker gets in
the way and impersonates the server with its misissued
DV certificate and sends its own JS. Because JS can do anything on the page,
this is the same as if the attacker had served the entire page,
but because whether the EV indicator is shown depends only on where
top-level HTML was loaded from, the client still displays
the EV UI.&lt;/p&gt;
&lt;p&gt;It&#39;s important to realize that this isn&#39;t just a bug in the
browser UI, it&#39;s a reflection of the basic way the Web works,
which depends on the &lt;strong&gt;origin&lt;/strong&gt; as the basic unit of identity
and these two certificates reflect the same origin. Note that
even if for some reason browsers radically changed the
Web security model, you&#39;d still have a problem because most
sites load scripts from totally different origins (e.g.,
Google analytics) and the browser has no way of knowing if
they should be EV or not.&lt;/p&gt;
&lt;h2 id=&quot;eidas-and-qwacs&quot;&gt;eIDAS and QWACs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#eidas-and-qwacs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This brings us to the EU&#39;s &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=EIDAS&amp;amp;oldid=1110049992&quot;&gt;eIDAS
regulation&lt;/a&gt;. eIDAS
stands for &amp;quot;electronic IDentification, Authentication and trust
Services&amp;quot;, though I&#39;ve only ever heard it called eIDAS.
eIDAS is generally concerned with establishing stronger online
identity structures, but one specific provision is directed
towards something called a &lt;em&gt;Qualified Website Authentication Certificate (QWAC)&lt;/em&gt;.
A QWAC is more or less the same as an EV certificate, except that
they are issued by what&#39;s called a &lt;em&gt;Qualified Trust Service Provider (QTSP)&lt;/em&gt;,
&lt;em&gt;[Updated Trusted -&amp;gt; Trust. Also changed TSP to QTSP throughout.
It&#39;s conventional to call them TSPs, but this is clearer.]&lt;/em&gt;
which
is a CA that is authorized by EU member states &lt;em&gt;[Updated: member states, not the EU.]&lt;/em&gt;
to issue certificates
defining legal identity.&lt;/p&gt;
&lt;p&gt;The original version of the eIDAS regulation was published in 2014,
and contained language defining QWACs, but did not require
support for them in browsers. Browsers mostly chose to ignore
this language and while quite a few of the QTSPs in the EU list are
also trusted by browsers, no major browser has special EV-style UI
for QWACs. This was perceived by proponents of QWACs as not meeting
their objective of having QWACs be used (unsurprisingly
many of the proponents of QWACs work for QTSPs).
eIDAS is currently being revised and the
current proposal contains language that would mandate that browser
support them.&lt;/p&gt;
&lt;p&gt;While I&#39;m not a lawyer it&#39;s generally understood that the revision
would require browsers to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Display the QWAC identity data.&lt;/li&gt;
&lt;li&gt;Support certificates issued by authorized &lt;em&gt;[Updated: EU-authorized to authorized]&lt;/em&gt; QTSPs &lt;em&gt;regardless of whether
those QTSPs were accepted into the browser root program.&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;From the perspective of a browser, the first of these requirements
is bad, but the second is much worse.&lt;/p&gt;
&lt;h3 id=&quot;mandatory-ui&quot;&gt;Mandatory UI &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#mandatory-ui&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As discussed above, browsers
removed EV certificates because there was good evidence that they
didn&#39;t work, and QWACs are basically the same as EV certs, so
a requirement to support them isn&#39;t great. The text of the regulation
itself is a little vague on this point—as I understand it,
it will then be fleshed out in &amp;quot;implementing acts&amp;quot;—but
at least one possibility is that browsers
would be required to support some common QWAC
UI (presumably designed by the EU in cooperation with CAs).
For instance, here&#39;s a &lt;a href=&quot;https://www.enisa.europa.eu/events/trust-servicies-forum-ca-day-2021/ca-day-presentation/05_chris-bailey_20210900-ca-day-designing-the-new-eidas-2-browser-ui.pdf&quot;&gt;2021 presentation&lt;/a&gt;
by Chris Bailey from Entrust on this topic that includes the suggestion
that not only should browsers have common UI, but that they
should be required to warn users whenever they
submitted a form on a cert with a DV site!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/entrust-qwac-preso.png&quot; alt=&quot;Entrust QWAC Presentation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Obviously, this precise proposal would have a very negative
impact on any site which used DV certificates, which is good if
you are a company that sells &lt;strike&gt;DV&lt;/strike&gt;EV certificates [&lt;em&gt;Updated]&lt;/em&gt;, but not so good for the Web
as a whole. More generally, though, designing a good browser
user interface is very difficult: you need to pack a lot of
information into a very small amount of screen real estate,
leaving room for the site itself. This is a difficult problem
at the best of times (look how &lt;a href=&quot;https://news.ycombinator.com/item?id=26464533&quot;&gt;upset&lt;/a&gt;
people got when Firefox removed the ability to make the
browser navigation UI take up slightly less vertical space),
and it will not be improved by having to implement a UI
designed to create as sharp a distinction as possible
between QWAC and non-QWAC certificates.&lt;/p&gt;
&lt;h3 id=&quot;qtsp-inclusion&quot;&gt;QTSP Inclusion &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#qtsp-inclusion&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As described above, browsers have a well-established set of
mechanisms for determining whether a CA should be accepted
for the purpose of authenticating Web sites. These
mechanisms include ensuring audits and over the past
decade have gradually improved the quality of the WebPKI
ecosystem, for instance by transitioning away from
SHA-1 certificates, adding requirements for Certificate
Transparency and functional revocation mechanisms, and
limiting certificate lifetime so that it&#39;s possible
to evolve the ecosystem in a reasonable time. Mozilla,
in particular, operates an open root program where
decisions are discussed on a &lt;a href=&quot;https://groups.google.com/a/mozilla.org/g/dev-security-policy&quot;&gt;public mailing list&lt;/a&gt; allowing all stakeholders to weigh in.&lt;/p&gt;
&lt;p&gt;If browsers were required to accept any QTSP that was
approved by the EU, this would of course allow those
QTSPs to bypass the browser&#39;s requirements, with two
major impacts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Browsers would be required to accept new QTSPs that
did not currently meet their requirements.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Browsers would be prevented or delayed in distrusting
QTSPs when evidence of misbehavior was found.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that this is different from EV certificates, where
the CAs were managed in the same way as DV certs and had
to meet the browser root program requirements.&lt;/p&gt;
&lt;p&gt;A mismatch between the browsers and the EU need not necessarily
result from the EU doing anything wrong: governments have
their own incentives, including considering the interests
of companies in their jurisdictions, and their judgments
about what&#39;s best might not match those made by browser
vendors. For example, the Certinomis CAs
was &lt;a href=&quot;https://wiki.mozilla.org/CA/Certinomis_Issues&quot;&gt;removed from&lt;/a&gt;
Firefox but is &lt;a href=&quot;https://esignature.ec.europa.eu/efda/tl-browser/#/screen/tl/FR/5&quot;&gt;still on the EU QTSP list&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Of course, a mandatory CA could also be used by a state-level
actor for surveillance. We have already seen attempts by
&lt;a href=&quot;https://www.bbc.com/news/technology-49421729&quot;&gt;Kazakhstan&lt;/a&gt;
and &lt;a href=&quot;https://discourse.mozilla.org/t/proposal-for-mitm-style-surveillance-in-mauritius/79506&quot;&gt;Mauritius&lt;/a&gt;
to require users to install their own trust anchors.
Mauritius eventually dropped their plans, but Kazakhstan
actually deployed their trust anchor and browsers had to
eventually blocklist their trust anchor to protect users.
This was actually a much easier case to handle because
users had to install the trust anchor themselves and
so the damage was limited: if browsers could be required
to trust specific trust anchors that were controlled
by state-level attackers, then they might not be able to
protect users against state-level surveillance.&lt;/p&gt;
&lt;h2 id=&quot;alternative-designs&quot;&gt;Alternative Designs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#alternative-designs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;From the browser&#39;s perspective, the central security
problem with the design of QWACs is that (like EV certs),
they are attesting to two separate pieces of server
identity:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The domain name, which is consumed by the browser
and used to determine the origin of the site.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The legal identity of the server, which is consumed
by the user (though of course parsed by the browser
so that it can display it to the user).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It&#39;s the ability of the QTSP to attest to the domain name
that creates the possibility for QTSP misbehavior to allow
for interception of user traffic.&lt;/p&gt;
&lt;h3 id=&quot;multiple-lists&quot;&gt;Multiple Lists &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#multiple-lists&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One possibility for addressing
this threat to separate out those functions. The simplest way
to do that is by having two lists:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The browser&#39;s existing CA trust anchor list.&lt;/li&gt;
&lt;li&gt;A separate QTSP list managed by the EU.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When a browser encountered a certificate, it would first
check that it was valid according to its normal procedures
against the standard trust anchor list, just as with DV certificates.
If those checks passed, then the browser would allow the connections.
The browser would also check to
see if the certificate was a QWAC and if it had been issued
by a valid QTSP and if so it would show the QWAC UI with
the appropriate identity information. The impact of this design
is that the browser can ensure that the QTSP is correctly
attesting to the server&#39;s domain name—and remove it if it
misbehaves—but does not have to assess whether the
QTSP is adequately verifying the server operator&#39;s legal identity;
even if it completely fails at that, attackers will not be able
to intercept connections.&lt;/p&gt;
&lt;h3 id=&quot;multiple-certificates&quot;&gt;Multiple Certificates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#multiple-certificates&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Having multiple lists mostly addresses the security problems with
QWACs, but leaves some operational problems. Specifically, because
QWACs require validation of real-world identity, they cannot be
automatically issued, whereas DV certificates can.  This means that DV
certificates are comparatively cheap and easy to deploy and can be
integrated with server automation.  But if you already have a DV
deployment, then switching over to QWACs/EV can be a big lift.
If you want QWACs to succeed, than this is likely to be a real
drag on deployment.&lt;/p&gt;
&lt;p&gt;Once you&#39;ve decided to have two lists, it&#39;s natural to have two
certificates as well: an ordinary DV certificate which attests
to the domain name and a QWAC which attests to the legal identity.
As noted above, this has relatively similar security properties
to a single certificate but superior operational properties because
you can layer a QWAC on top of the DV cert; this gives you increased
flexibility and also means that if something goes wrong with
the QWAC your site still works.&lt;/p&gt;
&lt;p&gt;There are a number of different designs for two certificate systems,
but the big design question is whether it&#39;s necessary for the server
to prove that it has the private key for the QWAC during connection
establishment (it already has to prove it has the private key for
the DV connection). Intuitively, it would seem like this was necessary,
but it turns out not to be because of the &amp;quot;mixed content&amp;quot; properties
mentioned above. Basically, even if you require the server to prove
that it has the QWAC key on a given connection, an attacker with
a valid DV certificate for the domain can just intercept a subsequent
connection and thus impersonate the server. Usually, the site will
consist of a combination of HTML and JavaScript, so if the attacker
allows the HTML to be served by the legitimate site and then
intercepts the connection for the JS, the QWAC UI will even be displayed.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Once you have this insight, the obvious design is to have a
mechanism for binding the QWAC to the domain name that is in
the DV certificate. This binding can be either direct,
with the domain name in the QWAC, or the QWAC just
having a key that is used to sign an &lt;em&gt;endorsement document&lt;/em&gt;
that contains the domain name. The site then presents the DV certificate
and the QWAC and the browser validates the DV certificate and checks
that the domain name matches in the DV cert matches that binding.
This is a familiar concept outside of the Web: when you go
to the airport you present your ticket which has your name
but not your picture and your photo ID which has your name and
your picture, but no information from the airline. The security
person verifies that the names match and uses the photo ID to
verify that it&#39;s really you.&lt;/p&gt;
&lt;p&gt;The diagram below shows how this might work in practice,
in Mozilla&#39;s two-certificate proposal, called &amp;quot;portable QWACs&amp;quot;:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/pqwac.png&quot; alt=&quot;pQWAC flow diagram&quot; /&gt;&lt;/p&gt;
&lt;p&gt;With two certificates, the server obtains a DV certificate as
usual, which it can use to serve TLS connections without doing
anything else. Subsequently, it can obtain a QWAC, which it
uses to sign the endorsement document binding the company
name (from the QWAC) to the domain name (in the DV cert).
When a client subsequently connects, it uses a TLS extension
to indicate that it supports QWACs and the server provides
the QWAC and the endorsement document in its handshake
(in the &lt;code&gt;EncryptedExtensions&lt;/code&gt; message). The client verifies
the DV cert, the endorsement document, and the QWAC, and
if everything checks out it completes the connection and
shows the right UI.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
Of course, this is just one way of building a two certificate
design; for instance the QWAC and endorsement document could
be sent in an HTTP header instead.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eidas-article45/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the end of the day, the main impact of the proposed regulation
is to dictate how browsers build their UI and maintain
their root stores, including preventing them from enforcing
their existing rules for CAs. The major rationale for this
is to pave the way for QWACs, which, are basically the same
as the EV certificates that we&#39;ve tried and discarded.
However, it&#39;s worth noting that at least some of the CAs seem to want to
restrict the ability of browsers to impose their own
standards on certificates at all, even for DV
certificates. For instance, a
&lt;a href=&quot;https://www.enisa.europa.eu/events/trust-services-forum-ca-day-2022/presentations/chris-bailey-enisa-trust-services-forum-2022.pdf&quot;&gt;recent presentation&lt;/a&gt; by Chris Bailey from
Entrust suggests that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Browsers bring &lt;u&gt;all extra browser rules&lt;/u&gt; for consensus and approval
under the CA/Browser Forum for industry standards which &lt;u&gt;are audited
under ETSI and WebTrust&lt;/u&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Similarly, in a recent &lt;a href=&quot;https://www.european-signature-dialog.eu/ESD_answer_to_Mozilla_misinformation_campaign.pdf&quot;&gt;white paper&lt;/a&gt;
&lt;a href=&quot;https://www.european-signature-dialog.eu/aboutus#section2&quot;&gt;European Signature Dialog&lt;/a&gt;
writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Today, all certificate issuers must not only provide annual conformance audits to Mozilla, but they also meet additional browser rules. But the additional browser rules are entirely subjective and may exist to promote the browser’s proprietary commercial interests — another example of US big tech setting the rules for Europe.&lt;/p&gt;
&lt;p&gt;Also, additional browser rules are not reviewed and approved by the internet ecosystem (e.g., the Certification Authority/Browser Forum (CABF), where all other certificate issuer rules are reviewed and approved by ballot of all the members, not just one browser).&lt;/p&gt;
&lt;p&gt;The browsers have been asked to bring their additional rules to the CABF for approval by the internet ecosystem, but the browsers have refused and are holding on to exclusive power by themselves. This should stop, and certificate issuers, including QWAC issuers, as well as the EU should have a say in all the certificate rules.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This reflects longstanding tensions between the CAs and the
browsers over who should determine the rules for certificates,
with the browsers viewing themselves as stewards of their
users&#39; privacy and security and the CAs wanting more of a voice
in governance.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
It&#39;s certainly understandable why CAs would want more control
of how browsers run their root programs; it&#39;s less clear why
it&#39;s in the interest of users for them to have it.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Or now sometimes QUIC, which uses a lot of the TLS infrastructure. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Either directly or transitively, for
instance by having a CA sign a certificate for another CA. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Full Disclosure:
I was part of the originating team of Let&#39;s Encrypt and Mozilla is currently a &amp;quot;Platinum Sponsor&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is also something called an &lt;em&gt;Organization Validation (OV)&lt;/em&gt;
certificate, which is partway between DV and EV. As far as I
can tell, there&#39;s never been any OV-specific UI in the main
UI, so it&#39;s not clear to me what the point is. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Let&#39;s Encrypt does not offer EV certificates because they
aren&#39;t able to automate issuance and the whole premise of
LE is to make certificate issuance so cheap that it can be
done for free. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;ve heard suggestions that sites ought to be able to send
back an HTTP header that told the client that it ought to
expect that all resources on a site be associated with a QWAC.
This is technically possible but a big deployment hassle
if you have multiple servers or if you include resources
from other sites, such as ads or Google analytics. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I am one of the authors of this proposal. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
If the DV cert doesn&#39;t check out, the client has to
terminate the connection, but if the QWAC or the endorsement
document are invalid, it can either terminate the
connection or complete it but without the QWAC UI.
The latter choice is obviously more robust to failure. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;A recent &lt;a href=&quot;https://www.bundeskartellamt.de/SharedDocs/Entscheidung/EN/Fallberichte/Missbrauchsaufsicht/2022/B7-250-19.pdf?__blob=publicationFile&amp;amp;v=4?&quot;&gt;report&lt;/a&gt;
by the German Bundeskartellamt provides some background on these
tensions with respect to Chrome in particular, and helps
give a sense of how the CAs view the situation. &lt;a href=&quot;https://educatedguesswork.org/posts/eidas-article45/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>First impressions of Bluesky&#39;s AT Protocol</title>
		<link href="https://educatedguesswork.org/posts/atproto-firstlook/"/>
		<updated>2022-11-06T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/atproto-firstlook/</id>
		<content type="html">&lt;p&gt;The first generation of Internet communications was
dominated by largely decentralized—and barely managed—communications
systems like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Usenet&amp;amp;oldid=1117071236&quot;&gt;USENET&lt;/a&gt;
and &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Internet_Relay_Chat&amp;amp;id=1116510499&amp;amp;wpFormIdentifier=titleform&quot;&gt;IRC&lt;/a&gt;, built on documented,
interoperable protocols. By contrast, the current generation
is highly centralized, built on a small number of
disconnected siloes like Twitter, Facebook, TikTok, etc.
In light of &lt;a href=&quot;https://twitter.com/StephenKing/status/1587042605627490304?ref_src=twsrc%5Etfw&quot;&gt;recent&lt;/a&gt; &lt;a href=&quot;https://www.theguardian.com/technology/2022/nov/07/twitter-will-ban-permanently-suspend-impersonator-accounts-elon-musk-says-as-users-take-his-name&quot;&gt;events&lt;/a&gt;, it should be clear that this is
not an optimal state of affairs, if only because what information
people have available to them shouldn&#39;t depend on
which billionaires own Facebook and Twitter.&lt;/p&gt;
&lt;p&gt;Over the years there has been a lot of interest in building
social networks with a more decentralized architecture,
such as &lt;a href=&quot;https://joinmastodon.org/&quot;&gt;Mastodon&lt;/a&gt; and
&lt;a href=&quot;https://diasporafoundation.org/&quot;&gt;Diaspora&lt;/a&gt;. These don&#39;t
have no users, but I think it&#39;s fair to say that they
haven&#39;t really displaced Twitter in the public conversation.
A few years ago Twitter&#39;s Jack Dorsey
&lt;a href=&quot;https://twitter.com/jack/status/1204766078468911106&quot;&gt;announced&lt;/a&gt; a project
called Bluesky, which was intended to design and build such a system.&lt;/p&gt;
&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;Twitter is funding a small independent team of up to five open source architects, engineers, and designers to develop an open and decentralized standard for social media. The goal is for Twitter to ultimately be a client of this standard. 🧵&lt;/p&gt;&amp;mdash; jack (@jack) &lt;a href=&quot;https://twitter.com/jack/status/1204766078468911106?ref_src=twsrc%5Etfw&quot;&gt;December 11, 2019&lt;/a&gt;&lt;/blockquote&gt; &lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt; 
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;mastodon%2C-activitypub%2C-and-the-fediverse&quot;&gt;Mastodon, ActivityPub, and the Fediverse &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#mastodon%2C-activitypub%2C-and-the-fediverse&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;I mention Mastodon here and that&#39;s what people seem to be using
but technically Mastodon is a piece of software that implements
Twitter-like functionality. Unlike Twitter, however, Mastodon
can talk to other servers using the W3C &lt;a href=&quot;https://www.w3.org/TR/activitypub/&quot;&gt;ActivityPub&lt;/a&gt;
protocol, including to servers running different software than
Mastodon. The collection of servers that federate (or at least
can federate) via ActivityPub is called the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Fediverse&amp;amp;oldid=1120281084&quot;&gt;Fediverse&lt;/a&gt;, but realistically you&#39;re likely to be using
Mastodon.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;While there wasn&#39;t any technology at the point Dorsey made this announcement,
it got a lot of interest anyway because Twitter using such a standard
actually would be a big deal and make it a lot more likely
to succeed. A few weeks ago, almost three years later, Bluesky published
the initial draft of what they are calling &lt;a href=&quot;https://atproto.com/&quot;&gt;ATProtocol (as in @-sign)&lt;/a&gt; or (ATP) which is described as &amp;quot;Social networking technology created by Bluesky&amp;quot;.
Let&#39;s take a look!&lt;/p&gt;
&lt;h2 id=&quot;overview&quot;&gt;Overview &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#overview&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Unsurprisingly, ATP seems principally designed to emulate
Twitter, though presumably you could adapt it to be more like
Facebook or Instagram.
The basic idea behind ATP is that each user has an account with
what&#39;s called a &lt;em&gt;personal data server (PDS)&lt;/em&gt;, which is where
they post stuff, read other people&#39;s posts, etc.
These PDSes communicate with each other (&amp;quot;federate&amp;quot;), with
the idea that this provides the experience of a single unified network,
as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/atproto-federation.png&quot; alt=&quot;ATProto Federation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This is basically the obvious design and it&#39;s more or less what&#39;s been
envisioned by previous systems, such as those based on ActivityPub.
You can run your own PDS, but it seems
more likely that most people will use some pre-existing PDS service,
so most PDSes will have a lot of users.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;polling-versus-notifications&quot;&gt;Polling Versus Notifications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#polling-versus-notifications&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;There are two basic designs for the situation
where node &lt;strong&gt;A&lt;/strong&gt; is waiting for something to happen on node &lt;strong&gt;B&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Polling&lt;/em&gt; in which &lt;strong&gt;A&lt;/strong&gt; contacts &lt;strong&gt;B&lt;/strong&gt; repeatedly
and asks &amp;quot;anything new&amp;quot;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Notifications&lt;/em&gt; in which &lt;strong&gt;A&lt;/strong&gt; tells &lt;strong&gt;B&lt;/strong&gt; what it is
waiting for and &lt;strong&gt;B&lt;/strong&gt; sends it a message when it
actually does.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Polling systems aren&#39;t very efficient when events are infrequent,
because &lt;strong&gt;B&lt;/strong&gt; faces a tradeoff between timeliness and load:
if it checks infrequently, then it won&#39;t learn about
new events until long after they happen.
If it checks frequently,
then most of those checks are wasted and there is a lot
of unnecessary load on both machines. In these cases,
notifications are a lot more efficient because messages
only need to be sent when something happens. On the other
hand, when the time between events is very low compared
to the acceptable latency for detecting them, then polling
can work reasonably well.&lt;/p&gt;
&lt;p&gt;For instance, in order to have an average detection latency of 1
second &lt;strong&gt;A&lt;/strong&gt; needs to poll every 2 seconds (assuming events happen randomly). If events happen about
every 100 seconds, then 98% of those checks are wasted.
On the other hand, if events happen on average every .1 second,
then almost every check will retrieve one or more event,
and polling can be efficient.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The way this seems to work in practice is that when Alice wants
to post a microblog entry (a &amp;quot;blue&amp;quot;? a &amp;quot;sky&amp;quot;?), she posts it to her
own PDS. If Bob is following Alice, his PDS somehow gets it
from Alice&#39;s PDS. It&#39;s not clear to me from the specs whether
this is done by having Alice&#39;s PDS notify Bob&#39;s PDS or by
having Bob&#39;s PDS poll. You probably want some kind of notification
system, especially if there are going to be small PDSes, but
the documents don&#39;t seem to specify that in enough detail
to make it work. Similarly, when Bob decides to like one of Alice&#39;s
her posts, he notifies his PDS and other PDSs, including Alice&#39;s
pick that up. It appears that when he wants to follow Alice, he
notifies his PDS, which notifies Alice&#39;s PDS which (I think) only succeeds if
Alice&#39;s PDS agrees.&lt;/p&gt;
&lt;p&gt;As I said above, this is mostly kind of the natural design, but there
are two somewhat less obvious features.&lt;/p&gt;
&lt;h3 id=&quot;portable-identity&quot;&gt;Portable Identity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#portable-identity&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In most distributed systems that I&#39;ve seen, identity is tied
to the server that you use. For example, if you use
Gmail and your address is &lt;code&gt;example@gmail.com&lt;/code&gt;, then
you can&#39;t just pick up your email account and move it to
Hotmail. With some work you can move the emails themselves
but your address will be &lt;code&gt;example@hotmail.com&lt;/code&gt;.
The situation is a little more complicated than this
because it&#39;s possible to use Gmail to host your
own domain, in which case you could transfer it
to another service, but all the addresses
in the same domain share the same service; you
can&#39;t have &lt;code&gt;example@example.com&lt;/code&gt; be on
Gmail and &lt;code&gt;doesnotexist@example.com&lt;/code&gt; be on Fastmail.&lt;/p&gt;
&lt;p&gt;The existing federated social networking systems I&#39;ve seen
seem to share this property. For instance, if you
have an account on &lt;code&gt;mastodon.social&lt;/code&gt; then your
identity is effectively &lt;code&gt;example@mastodon.social&lt;/code&gt;;
this allows a user on (say) &lt;code&gt;mastodon.online&lt;/code&gt;
to refer to you as &lt;code&gt;https://mastodon.online/@example@mastodon.social&lt;/code&gt;,
which admittedly looks kind of awkward.
Note that this is hidden a bit by the UI because you can
just refer to people on your own server by unqualified
names. For instance, &lt;code&gt;https://mastodon.online/@example&lt;/code&gt;
is shorthand for &lt;code&gt;https://mastodon.online/@example@mastodon.social&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;ATP allows you to have a persistent identity that is portable
between PDSes. It does so by introducing the computer
scientist&#39;s favorite tool, &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Fundamental_theorem_of_software_engineering&amp;amp;oldid=1066664170&quot;&gt;another layer of indirection&lt;/a&gt;.
The basic idea is that your identity is used to &lt;em&gt;look up&lt;/em&gt; which
PDS your data is actually stored on; that way you can move
from PDS to PDS without changing your identity. The stated
value proposition here is that if a PDS decides to block
you then you just move to a different PDS and you can take
all of your posts and followers with you.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Account portability is the major reason why we chose to build a separate protocol. We consider portability to be crucial because it protects users from sudden bans, server shutdowns, and policy disagreements. Our solution for portability requires both signed data repositories and DIDs, neither of which are easy to retrofit into ActivityPub. The migration tools for ActivityPub are comparatively limited; they require the original server to provide a redirect and cannot migrate the user&#39;s previous data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In order to make this work, each user&#39;s identity is associated
with an asymmetric (public/private) key pair which is then used
to sign their data (posts, likes, etc.). That way when they
move their data from PDS &lt;strong&gt;A&lt;/strong&gt; to PDS &lt;strong&gt;B&lt;/strong&gt;, you can tell it&#39;s
them by verifying the digital signature over the data.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
In fact, at some level the PDS is just a convenience, though an important one: if you got
their data by any mechanism at all, you could always tell
it was correct by verifying the data.&lt;/p&gt;
&lt;h3 id=&quot;scaling&quot;&gt;Scaling &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#scaling&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The messaging fan-out of a system like Twitter is quite different
from those of other federated messaging systems like instant messaging
(and to some extent e-mail). Although there are groups, IM is mostly
a person to person activity, with any given message being sent to
a relatively small number of people. The situation with e-mail
is somewhat more complicated, with most messages sent by individuals
going to a small number of people (more below on marketing communications).
Tweets, by contrast, tend to be sent to large groups.&lt;/p&gt;
&lt;p&gt;As an example, I&#39;m a relatively small-scale Twitter user, but I have
over 1000 followers, which means that every time I push the tweet
button I&#39;m notifying all of those people. It&#39;s not unknown to have
over 100 million Twitter followers like Elon Musk or Barack Obama. By
contrast, even Gmail workplace users can&#39;t send to &lt;a href=&quot;https://support.google.com/a/answer/166852?hl=en&quot;&gt;more than 2000
users&lt;/a&gt; in a single
message, and only 500 of those can be outside of Gmail. So, the
dynamics here are totally different. If you want to send
to a large number of people, e.g., for marketing or mailing
lists, then you would typically use a specialized e-mail sender like
Sendgrid or Mailgun.&lt;/p&gt;
&lt;p&gt;This level of fan-out already presents a bit of a challenge for
a federated system: if I have 1000 followers on 500 different
PDSs, then my PDS needs to contact each of them every time I
tweet. This isn&#39;t necessarily infeasible, but if I have a million
followers spread over 10,000 PDSes, the situation starts to get
somewhat worse in terms of scale. We should of course expect
that there will be significant concentration in the PDS market,
just like with e-mail, with a few large PDSes having most of the
users and then a long tail of small PDSes.&lt;/p&gt;
&lt;p&gt;In addition to the high level of fan-out, Twitter provides functionality
that covers large number of messages. In particular, it&#39;s possible
to search for messages by content, hashtag, etc., and Twitter
promotes &amp;quot;trending&amp;quot; tweets to you. These functions require access to the
entire database—at least the public database—of tweets.
Obviously, receiving the entire database of (&lt;a href=&quot;https://www.dsayce.com/social-media/tweets-day/&quot;&gt;6000+ tweets per second&lt;/a&gt;) is prohibitive for a small device, so it won&#39;t be possible
for every PDS to offer this service.&lt;/p&gt;
&lt;p&gt;ATP proposes to address this by having a two-level system, with
a second layer of &amp;quot;crawling indexers&amp;quot; who have access to all
the data and can offer a personalized view, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/small-big-world.jpg&quot; alt=&quot;Two level architecture&quot; /&gt;
[Source: ATP docs]&lt;/p&gt;
&lt;p&gt;As above, the documentation is pretty vague on how this is supposed
to work. Indeed, the diagram above and somewhere around 100 words
in the docs are about all there is, so I can&#39;t tell you how it&#39;s
supposed to work. With that said,
the reference to &amp;quot;crawling&amp;quot; is surprising:
for efficiency reasons you don&#39;t really want this kind of service
to act like an ordinary PDS but rather to have special APIs that
allow it to get a full feed of what&#39;s happening, and even better
some directory-type mechanism for identifying all the PDSes in the world,
but I don&#39;t see
anything like this in the API docs (please point me at this if I&#39;m
missing it).&lt;/p&gt;
&lt;h2 id=&quot;a-bit-more-detail&quot;&gt;A Bit More Detail &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#a-bit-more-detail&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I don&#39;t want to get too deep into the details of ATP, but it&#39;s worth
taking a closer look at a few of the pieces of the system.&lt;/p&gt;
&lt;h3 id=&quot;identity-system&quot;&gt;Identity System &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#identity-system&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As noted above, the way that the handle system works is that you
start with a &amp;quot;handle&amp;quot; that&#39;s expressed as a hierarchical name
rooted in the DNS, e.g., &lt;code&gt;@alice.example.com&lt;/code&gt;. In a conventional
system like e-mail or Jabber, this would actually be expressed
as &lt;code&gt;alice@example.com&lt;/code&gt; but because this is supposed to be like
Twitter and Twitter already uses the @-sign to indicate
usernames—e.g., to distinguish them from hashtags—you
have to either have names with two @-signs, like &lt;code&gt;@alice@example.com&lt;/code&gt; like
Mastodo—or two different separators—or omit the separator between the actual username and the
domain it lives in. To parse these names, you just remove the
first label and treat it as the user name (note that this means
you can&#39;t have a &lt;code&gt;.&lt;/code&gt; in your user name).&lt;/p&gt;
&lt;p&gt;This creates some ambiguity about whether an identifier is a domain
name or a user name (e.g., what&#39;s &lt;code&gt;web.example.com&lt;/code&gt;). In principle,
if it has an @-sign in front of it, it&#39;s a user name, but of course
people aren&#39;t consistent about that kind of thing, and the name
is perfectly legible without it. Moreover, because domain names
are hierarchical, it&#39;s possible to have a situation where the
same identifier is &lt;em&gt;both&lt;/em&gt; a username and a domain name, e.g.,
if there is a user &lt;code&gt;alice&lt;/code&gt; on the domain &lt;code&gt;example.com&lt;/code&gt; but there
is also a subdomain &lt;code&gt;alice.example.com&lt;/code&gt;. This can&#39;t happen
with e-mail addresses because the interior @-sign provides a boundary,
but that&#39;s not true here. In general, this just doesn&#39;t seem like
that great a design choice, though it&#39;s not a disaster.&lt;/p&gt;
&lt;p&gt;In order to resolve an handle, you do an &lt;a href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#rpc-protocol&quot;&gt;RPC query&lt;/a&gt; to
the endpoint associated with the domain name of the handle. This
returns a &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity#background%3A-did&quot;&gt;DID&lt;/a&gt;. That
DID can then resolved to obtain the public key associated with the
user. As described above, that key is used to sign the user&#39;s data.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;ATP supports two flavors of DID—out of the 50+ variants currently
specified (this kind of profiling is necessary if you want to have
DID interoperability):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity#did3Aweb&quot;&gt;did:web&lt;/a&gt;, which just means
that you do an HTTPS fetch to a Web site to retrieve the DID
document (i.e., the public key).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A new DID form called &lt;a href=&quot;https://atproto.com/specs/did-plc&quot;&gt;DID placeholder (did:plc)&lt;/a&gt;),
which consists of a hash of a public key which can then be used directly or
sign new public keys to allow rollover (see my long &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity&quot;&gt;post&lt;/a&gt;
for more on this topic). As an aside, it&#39;s not clear to me how you actually
obtain the DID document associated with a &lt;code&gt;did:plc&lt;/code&gt; DID, as the public
key isn&#39;t sufficient to retrieve it. There&#39;s apparently a PLC server, but is there
only one? If not, how do you find the right one? This all seems unclear.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Obviously, the security of the &lt;code&gt;did:web&lt;/code&gt; resolution process depends on DNS
security, but even if you use &lt;code&gt;did:plc&lt;/code&gt;, the &lt;em&gt;handle resolution process&lt;/em&gt; depends on the DNS.
This means that an attacker who controls the DNS or the handle server for
a given DNS name can provide any DID of their
choice, thus bypassing the cryptographic controls that &lt;code&gt;did:plc&lt;/code&gt; or any
similar mechanism use to provide verified rollover. Suppose that Alice&#39;s
handle is &lt;code&gt;@alice.example.com&lt;/code&gt; and this maps to &lt;code&gt;did:plc:1234&lt;/code&gt;: because
an attacker doesn&#39;t know the private key associated with this DID, they
can&#39;t get it to authorize their public key, but if they can gain control
of &lt;code&gt;example.com&lt;/code&gt; then they can just remap &lt;code&gt;@alice.example.com&lt;/code&gt; to &lt;code&gt;did:plc:5678&lt;/code&gt;,
and relying parties won&#39;t even get to the rollover checks.&lt;/p&gt;
&lt;p&gt;There seems to be some implicit assumption that clients (or other PDSes) will
retrieve the DID associated with a handle and then remember it indefinitely,
though it&#39;s not quite explicitly stated:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The DNS handle is a user-facing identifier — it should be shown in
UIs and promoted as a way to find users. Applications resolve
handles to DIDs and then use the DID as the stable canonical
identifier. The DID can then be securely resolved to a DID document
which includes public keys and user services.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&#39;m not sure how realistic this is: retaining this kind of state is a pain
and so it will be natural to treat it as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Soft_state&amp;amp;oldid=994510290&quot;&gt;soft state&lt;/a&gt; by caching it but not worry to hard if it gets lost because
you can always retrieve it. In any case, a basic assumption of a system
like this is that new PDSes—and users—will be constantly
joining the system, and if the handle domain is compromised they will
get the wrong answer, in which case you&#39;ll have a network partition in
which some users and PDSes have the right key and some have the wrong key.&lt;/p&gt;
&lt;p&gt;More generally, it&#39;s not clear what the overall model is. Specifically,
is the handle → DID mapping invariant once it&#39;s established or
is it expected to change? If the former, then it won&#39;t be possible
to transition from &lt;code&gt;did:web&lt;/code&gt; to &lt;code&gt;did:plc&lt;/code&gt;, or—as the name &amp;quot;placeholder&amp;quot; suggests—to
transition from &lt;code&gt;did:plc&lt;/code&gt; to some new DID type, because there will
always be some clients who have permanently stored the old DID
and thus you will never be able to abandon it.
On the other hand, if it&#39;s not invariant, then you need some mechanism
to allow clients/PDSes to get updates, such as having a time-to-live
associated with the handle resolution process (potentially based on
HTTP caching). In either case, ATP should either build in some certificate transparency-type
mechanism to protect against compromise of the handle servers or
just admit that the security of ATP identity depends on the DNS,
in which case you don&#39;t need something like &lt;code&gt;did:plc&lt;/code&gt; and
could presumably skip the DID step entirely and
just store the public key and associated data right on the handle
server. Either way, this is the kind of topic that I would ordinarily
expect to be clearly defined in a specification.&lt;/p&gt;
&lt;p&gt;In any case, I don&#39;t think that this mechanism completely delivers on the
censorship-resistance aspect of portability: it&#39;s true that you
can move your &lt;em&gt;data&lt;/em&gt; from one PDS to another, but because your
handle is still tied to some server you&#39;re vulnerable to having
that server cut you off. Even if some servers have cached your
handle mapping, many won&#39;t have and so the result will be a partial
outage. It&#39;s true that it&#39;s probably cheaper
to run a handle mapping server than a PDS, so you might be able to
run that but outsource the PDS piece,
but it also seems likely that most people will just run them
in the same place, so I&#39;m not sure how much good this does in practice.&lt;/p&gt;
&lt;h3 id=&quot;rpc-protocol&quot;&gt;RPC Protocol &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#rpc-protocol&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At heart, ATP is a fairly conventional HTTP request/response
protocol with a schema-based RPC layer on top of it. The idea
is that new protocol endpoints are specified by JSON schema
which define the messages to be sent and received and
can then be compiled down to code which can be called
by the user. They docs give the following &lt;a href=&quot;https://atproto.com/guides/lexicon&quot;&gt;example&lt;/a&gt;
of a schema:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string-property property&quot;&gt;&quot;lexicon&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;com.example.getProfile&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;query&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string-property property&quot;&gt;&quot;parameters&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;user&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string-property property&quot;&gt;&quot;required&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string-property property&quot;&gt;&quot;output&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;encoding&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;application/json&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;object&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token string-property property&quot;&gt;&quot;required&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;did&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token string-property property&quot;&gt;&quot;properties&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;did&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;displayName&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string-property property&quot;&gt;&quot;maxLength&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string-property property&quot;&gt;&quot;maxLength&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This generates an API which can be used like so:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; client&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;com&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;example&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getProfile&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token literal-property property&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&#39;bob.com&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;// =&gt; {name: &#39;bob.com&#39;, did: &#39;did:plc:1234&#39;, displayName: &#39;...&#39;, ...}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is all pretty conventional stuff.
I know that there are a lot of opinions in the Web API
community over whether it&#39;s better
to have this kind of RPC-style interface or a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Representational_state_transfer&amp;amp;oldid=1120127488&quot;&gt;REST&lt;/a&gt;-style interface in which
every resource has its own URL, but I don&#39;t think anyone would
say it&#39;s a make-or-break issue; it&#39;s not like you can&#39;t
make this kind of API work.&lt;/p&gt;
&lt;p&gt;I&#39;m more concerned by the fact that the
API documentation is so thin. As a concrete example, here&#39;s
the entire definition of the data structure &amp;quot;feed&amp;quot;:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token literal-property property&quot;&gt;subject&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; Subject&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token literal-property property&quot;&gt;createdAt&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; string&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Subject&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token literal-property property&quot;&gt;uri&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; string&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token literal-property property&quot;&gt;cid&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; string&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What do these values mean? We might infer that &lt;code&gt;createdAt&lt;/code&gt; is a date, but
maybe not? What are the semantics of &lt;code&gt;Subject.uri&lt;/code&gt;? Who knows?&lt;/p&gt;
&lt;p&gt;I&#39;ll have more to say about this later, but for the moment I would
observe that this is a pretty common pattern in systems that were built
by writing software and then documenting its interfaces, rather than
writing a protocol specification first and then implementing
(though of course I don&#39;t know if that&#39;s what happened here). The result is that the
specification just becomes &amp;quot;whatever the software does&amp;quot;, and often
the documentation is insufficient and you&#39;re reduced to reading
the source code to reverse engineer the protocol. It&#39;s not awesome.&lt;/p&gt;
&lt;h3 id=&quot;access-control&quot;&gt;Access Control &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#access-control&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One thing that isn&#39;t clear to me is how access control is supposed
to work. For instance, if I want to have a post that is only
readable by some people how does this work? The situation is not
at all clarified by the fact that the section on &lt;a href=&quot;https://atproto.com/specs/xrpc#authentication&quot;&gt;Authentication&lt;/a&gt; consists entirely of the word &amp;quot;TODO&amp;quot;.
However, ignoring the technical details, it seems like there are two
major approaches, neither of which is really optimal.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A post is separately encrypted to each authorized reader.&lt;/li&gt;
&lt;li&gt;The PDSs enforce access based on who is following a
given user.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first of these is straightforward technically, but operationally
clunky as it requires not only knowing the public keys of all of your followers
at the time you post, but also being able to go back and retroactively
encrypt posts to new followers or when existing followers change their
keys.&lt;/p&gt;
&lt;p&gt;The alternative is less clunky, but requires a lot of trust in PDSes.
To see why, consider the case when Alice is on PDS &lt;strong&gt;A&lt;/strong&gt; and Bob and Charlie
are on PDS &lt;strong&gt;B&lt;/strong&gt;. Alice restricts here posts and Bob follows Alice but Charlie
does not, so Charlie should not be able to see Alice&#39;s posts. However,
when Alice posts something, it gets sent to PDS &lt;strong&gt;B&lt;/strong&gt;, which then has
to show it only to Bob but not Charlie. The obvious problem here is that
Alice (hopefully) trusts her PDS but has no real relationship with PDS &lt;strong&gt;B&lt;/strong&gt;;
she just has to trust that it does the right thing
(in Twitter, this is trusted by just trusting Twitter). This is basically
a generalized version of the problem that Alice has to trust Charlie
not to reveal her tweets, but it&#39;s obviously quite a bit worse in
a system like this where there a lot of PDSes, where we end up with
a distributed single point of failure in the form of exposure
to vulnerabilities and misbehavior by every
Alice where she has a follower.&lt;/p&gt;
&lt;p&gt;Actually, the situation is potentially worse than this: what about
PDS &lt;strong&gt;C&lt;/strong&gt; which doesn&#39;t have any of Alice&#39;s followers? What stops
it from getting Alice&#39;s posts? The documents don&#39;t say how this
works, but at a high level, I think
what has to happen is that PDS &lt;strong&gt;A&lt;/strong&gt; has to verify that each PDS
requesting a copy of Alice&#39;s posts has at least one user that
follows Alice (presumably by working forward from the DIDs on
Alice&#39;s follower list), which seems kind of clunky.&lt;/p&gt;
&lt;h2 id=&quot;thoughts-on-system-architecture&quot;&gt;Thoughts on System Architecture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#thoughts-on-system-architecture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;When looking at a system like this, I usually try to ignore most
of the details and instead ask &amp;quot;what is the overall system architecture&amp;quot;?
The idea is to understand at a high level what the various pieces
are and how the fit together to try to accomplish various
tasks. In &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc4101&quot;&gt;RFC 4101&lt;/a&gt;
I phrased this as being at the &amp;quot;boxes and arrows&amp;quot; level:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our experience indicates that it is easiest to grasp protocol models
when they are presented in visual form.  We recommend a
presentation format centered around a few key diagrams, with
explanatory text for each.  These diagrams should be simple and
typically consist of &amp;quot;boxes and arrows&amp;quot; -- boxes representing the
major components, arrows representing their relationships, and
labels indicating important features.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For instance, it doesn&#39;t really matter whether the communications
between client and server use RPC, REST, or something else, but
what &lt;em&gt;does&lt;/em&gt; matter is who talks to who, and when. Given this
kind of architectural description, an experienced protocol designer
can generally design something that will work, even if two designers wouldn&#39;t build exactly the same thing. It&#39;s much harder
to go the other way, from the detailed description to the architecture.
and worse yet, it tends to obscure important questions.&lt;/p&gt;
&lt;p&gt;I think that this high level description is what the  &lt;a href=&quot;https://atproto.com/guides/overview&quot;&gt;Overview&lt;/a&gt; is trying to provide, but it&#39;s really more of an introduction
and leaves a lot of big picture
questions unclear that would be easier to understand if it were a more complete
description of how stuff worked. For instance:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;How does a PDS learn about new activity on another PDS?&lt;/li&gt;
&lt;li&gt;How do the &amp;quot;crawlers&amp;quot; learn about new PDSes and the content
in them?&lt;/li&gt;
&lt;li&gt;How does access control work, for instance, if a post is
private?&lt;/li&gt;
&lt;li&gt;What are the scaling properties of the system?&lt;/li&gt;
&lt;li&gt;What are the security guarantees around identities and integrity
of the data?&lt;/li&gt;
&lt;li&gt;How do you handle various kinds of abuse? For example, suppose
that someone sends abusive messages to others: does each PDS
(or user!) have to block them separately or is there some kind
of centralized reputation system?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As an aside, these questions would all be a lot simpler in a centralized
system.&lt;/p&gt;
&lt;p&gt;This isn&#39;t just a matter of presentation, but also of design.
In my experience, the right way to design a system is to start from
this kind of top-level question and try to build—and document—an architecture
that answers this kind of question and only then design the specific
pieces, in part because the details often
to obscure issues that are visible at higher layers of abstraction
(see, for instance, the discussion of DNS-based names and &lt;code&gt;did:plc&lt;/code&gt; above).
However, it &lt;em&gt;also&lt;/em&gt; makes it easier for people to understand what
you&#39;re talking about rather than forcing them to reverse engineer
the structure of the system from the details, as is the case here.
As noted above, I suspect this is a result of having a single implementation
and then a spec which documents that implementation.&lt;/p&gt;
&lt;h2 id=&quot;the-even-bigger-picture&quot;&gt;The Even Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#the-even-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As the ATP authors acknowledge in theor &lt;a href=&quot;https://atproto.com/guides/faq&quot;&gt;FAQ&lt;/a&gt; there
already is an existing federated social networking system based on
&lt;a href=&quot;https://www.w3.org/TR/activitypub/&quot;&gt;ActivityPub&lt;/a&gt;, though in practice
mostly centered around &lt;a href=&quot;https://joinmastodon.org/&quot;&gt;Mastodon&lt;/a&gt;. Mastodon
seems to be having a bit of a moment now in the wake of the chaos
surrounding Elon Musk&#39;s acquisition of Twitter:&lt;/p&gt;
&lt;blockquote class=&quot;twitter-tweet&quot;&gt;&lt;p lang=&quot;en&quot; dir=&quot;ltr&quot;&gt;For anyone wondering, Mastodon got over 70K sign-ups yesterday alone. Let&amp;#39;s keep the momentum going! The &amp;quot;public square&amp;quot; of the web must not belong to any one person or corporation!&lt;/p&gt;&amp;mdash; Mastodon (@joinmastodon) &lt;a href=&quot;https://twitter.com/joinmastodon/status/1586525904997863427?ref_src=twsrc%5Etfw&quot;&gt;October 30, 2022&lt;/a&gt;&lt;/blockquote&gt; &lt;script async=&quot;&quot; src=&quot;https://platform.twitter.com/widgets.js&quot; charset=&quot;utf-8&quot;&gt;&lt;/script&gt; 
&lt;p&gt;Even so, it has a tiny fraction of Twitter&#39;s user base.&lt;/p&gt;
&lt;p&gt;In general, experience suggests that it&#39;s pretty hard to start a competitive
social network (&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Google%2B&amp;amp;oldid=1119997343&quot;&gt;Google Plus&lt;/a&gt;,
I&#39;m looking at you), but not primarily because it&#39;s technically hard.
There are some real challenges in building
a federated network, but building a non-federated system like
Twitter is conceptually pretty easy, though of course operating
at Twitter&#39;s scale is challenging. Rather, the issue is that
because social networks are network effect
products (it&#39;s right there in the name) and so the initial value of
the network when it has few users is very low. This is especially true with something like
Twitter where so much of the value is in the feed of new content, as opposed to
YouTube (or arguably TikTok), where someone can send you a link and
you can just watch that one video.&lt;/p&gt;
&lt;p&gt;Far more than any of the technical details, what made Bluesky
interesting when compared to (say) Mastodon was that it was designed
under the auspices of Twitter with the stated objective of being used
by Twitter. If Twitter actually adopted ATP, then suddenly ATP would have a huge
number of users, getting you past the entry barrier that other new
social networks have to surmount. However, I was always pretty skeptical that
this was going to happen, for two reasons.&lt;/p&gt;
&lt;p&gt;First, Twitter, like most other free services, makes money by selling
ads. If there were some way easy way to stand up a service which interoperated
with Twitter, including seeing everyone&#39;s tweets, but without showing
Twitter&#39;s ads, that seems pretty straightforwardly bad for Twitter,
which would then have to compete on user experience rather than on its
user base moat.&lt;/p&gt;
&lt;p&gt;Second, Twitter didn&#39;t need a fancy new protocol to allow for service
interoperability; they could have just implemented ActivityPub.
I recognize that there were technical objectives that ActivityPub
didn&#39;t meet, but something is better than nothing and they could have
used the time to develop something more to their liking and gradually
migrated over. Obviously, this isn&#39;t ideal from an engineering perspective,
but if what you wanted to do was get rid of Twitter&#39;s monopoly, then
it would get you a lot further than taking three years to develop something
new; when put together with Twitter&#39;s lack of an explicit commitment to use
the Bluesky work this suggests that actually making Twitter interoperate
was not a priority, even with the old Twitter management.&lt;/p&gt;
&lt;p&gt;Of course, now Elon Musk owns Twitter, so whatever Jack Dorsey&#39;s intentions
were seems a lot less relevant, and we&#39;ll just have to wait and see what,
if anything Musk decides to do. Perhaps it will involve a &lt;a href=&quot;https://www.coindesk.com/business/2022/09/30/elon-musk-was-mulling-creating-a-blockchain-based-social-media-firm-before-offering-to-buy-twitter/&quot;&gt;blockchain&lt;/a&gt;.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Actually over a Merkle Search Tree over the data, but
the details don&#39;t much matter here. &lt;a href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The documentation gestures at using the DID for &amp;quot;end-to-end encryption&amp;quot;,
but doesn&#39;t specify how that would happen. Building a system like this
in practice is fairly complicated, so more work would be neeeded here. &lt;a href=&quot;https://educatedguesswork.org/posts/atproto-firstlook/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>How to hide your IP address</title>
		<link href="https://educatedguesswork.org/posts/traffic-relaying/"/>
		<updated>2022-10-17T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/traffic-relaying/</id>
		<content type="html">&lt;p&gt;As I mentioned previously in my posts on
&lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing&quot;&gt;private browsing&lt;/a&gt; and &lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi&quot;&gt;public WiFi&lt;/a&gt;,
if you really want to keep your activity on the Internet private, you
need some way to protect your IP address (i.e., the address that machines
on the Internet use to talk to your computer)
and the IP addresses of the servers
you are going to. There are a variety of different technologies
you can use for this purpose, with somewhat different properties.
This post provides a perhaps over-long description of the various
options.&lt;/p&gt;
&lt;h2 id=&quot;the-basics&quot;&gt;The Basics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#the-basics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As usual, with any security problem, we need to start with the
threat model. We are concerned with two primary modes of attack:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The server learning the user&#39;s IP address and using it to
identify them or correlate their activity.&lt;/li&gt;
&lt;li&gt;The local network learning which servers the user is going to.&lt;/li&gt;
&lt;li&gt;The server using your
apparent geolocation as determined from your IP address to
restrict access to certain kinds of content (soccer, BBC, whatever).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of course, whether you think this last item is actually a form of attack that should be
defended against depends on your perspective and maybe how big a Doctor
Who fan you are.&lt;/p&gt;
&lt;p&gt;The basic technique for defending against threats (1) and (3) is to
push the traffic through some kind of anonymizing relay:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/relay-basic.png&quot; alt=&quot;A basic anonymizing relay&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As shown in the diagram above, the client connects to the relay and
tells it where to connect. It then sends traffic to the relay,
which forwards it to the server. The relay replaces the client&#39;s
IP address with its own, so the server just sees the relay&#39;s
address. In general, the relay will be serving quite a few
clients, so the server will find it hard to distinguish which
one is which (&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=K-anonymity&amp;amp;oldid=1108999307&quot;&gt;k-anonymity&lt;/a&gt;).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This simple version clearly addresses threat (1), and, if the relay operator lets
you select an IP address outside your own geographic region,
threat (3). In order to defend against threat (2) you also need
to encrypt the traffic to the relay so that an attacker on
your network can&#39;t see which server you are connecting to and
the traffic you are sending to it (see &lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#routes-for-browsing-behavior-leakage&quot;&gt;here&lt;/a&gt;
for more on this form of data leakage). Ideally, you would also encrypt
the traffic &lt;em&gt;end-to-end&lt;/em&gt; to the server (using TLS or QUIC), but
that&#39;s just generally good practice, not required for the privacy
provided by the relay.&lt;/p&gt;
&lt;h2 id=&quot;relaying-options&quot;&gt;Relaying Options &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#relaying-options&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This basic design is at the heart of every relaying system,
but the details vary in important ways. There are three
major axes of variation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The network &lt;em&gt;layer&lt;/em&gt; at which relaying happens&lt;/li&gt;
&lt;li&gt;The number of &lt;em&gt;hops&lt;/em&gt; in the network&lt;/li&gt;
&lt;li&gt;Business model&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We cover each of these below.&lt;/p&gt;
&lt;h3 id=&quot;network-layer&quot;&gt;Network Layer &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#network-layer&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first major point of variation is the &lt;em&gt;layer&lt;/em&gt; at which the relaying
happens. Understanding this requires a bit of background on how
the Internet networking protocols work.&lt;/p&gt;
&lt;h4 id=&quot;ip&quot;&gt;IP &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#ip&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The most basic protocol on the Internet is what&#39;s called, somewhat
unsurprisingly, &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internet_Protocol&amp;amp;oldid=1115350518&quot;&gt;Internet Protocol (IP)&lt;/a&gt;. IP is what&#39;s called a &amp;quot;packet switching&amp;quot; protocol, which means
that the basic unit is a self-contained message called a &lt;strong&gt;packet&lt;/strong&gt;.
A packet is like a letter in that it has a source address and a
destination address. This means that when you send an IP packet on
the network, the Internet can automatically route the packet to the
destination address by looking at the packet with no other state
about either computer. A simplified IP packet looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/IP-packet.png&quot; alt=&quot;IP Packet&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The main thing in the packet is the actually &lt;em&gt;data&lt;/em&gt; to be delivered
from the source to the destination, also called the &lt;em&gt;payload&lt;/em&gt;.
The payload is variable length with a maximum typically
around 1500 bytes.
The packet also has a &lt;em&gt;next protocol&lt;/em&gt; field which tells the
receiver how to interpret the payload (more on this later)
and a &lt;em&gt;length&lt;/em&gt; field so that it is possible to tell how
long the entire packet is, including the variable length
payload.&lt;/p&gt;
&lt;p&gt;Using IP is very simple: your computer transmits an IP
packet on the wire and the Internet uses the destination
address to figure out where to route it. When someone
wants to transmit to you, they do the same thing.&lt;/p&gt;
&lt;h4 id=&quot;tcp&quot;&gt;TCP &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#tcp&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;If all you want to do is send a thousand or so bytes from one
machine to the other, a single IP packet might be OK, but
in practice this is almost never what you want to do.
In particular, it&#39;s very common to want to send
a stream of data (e.g., a file) which is much longer than
1500 bytes. At a high level, this is done by breaking up the
data into a series of smaller chunks and sending each one
in a single packet. But of course, life isn&#39;t so simple.
For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Packets might be lost, and must be retransmitted so that
the receiver gets them.&lt;/li&gt;
&lt;li&gt;Packets might be reordered, and the receiver must know
which order to put them in.&lt;/li&gt;
&lt;li&gt;In general, the network will not be able to handle an
entire large file at once, so the data must be gradually
transmitted over time. The sender must have some way to
determine the appropriate sending rate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transmission_Control_Protocol&amp;amp;oldid=1114104762&quot;&gt;Transmission Control Protocol (TCP)&lt;/a&gt; is responsible for taking care of these issues.
The details of TCP are far too complicated to fit in this
blog post, but at a high level, the data stream is broken up
into &lt;em&gt;segments&lt;/em&gt;, each of which has a length and a sequence number,
which tells you where it goes in the stream. Each segment
is sent in an IP packet. When the receiver gets a segment
it can look at the sequence number to reconstruct the stream
and is able to detect gaps where packets are missing.
TCP also includes an &lt;em&gt;acknowledgment&lt;/em&gt; mechanism in which the
receiver tells the sender which segments it has received;
this allows the sender to retransmit packets which were
lost as well as to adjust its sending rate appropriately.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
TCP requires setting up state between the two endpoints;
this state is termed a &amp;quot;TCP connection.&amp;quot;&lt;/p&gt;
&lt;p&gt;There are of course other protocols besides TCP which can run over
IP (for instance, UDP, mentioned later). This is why you
need the &amp;quot;next protocol&amp;quot; field in IP: to tell the receiver what
protocol is in the IP payload.&lt;/p&gt;
&lt;h3 id=&quot;tls&quot;&gt;TLS &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#tls&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;TCP is a very old protocol and like most of the older Internet
protocols, it was designed before widespread use of encryption
was practical. This is obviously bad news from a security
perspective, and eventually people got around to fixing it. The standard solution is to carry the
data over &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transport_Layer_Security&amp;amp;oldid=1110721112&quot;&gt;Transport Layer Security (TLS)&lt;/a&gt;. TLS basically provides the abstraction of an encrypted
and authenticated stream of data on top of a TCP connection.
As with TCP, you need to set up some state to use TLS,
and that&#39;s called a &amp;quot;TLS connection&amp;quot;. I can talk endlessly
about TLS but I won&#39;t do so here.&lt;/p&gt;
&lt;h3 id=&quot;udp-and-quic&quot;&gt;UDP and QUIC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#udp-and-quic&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Applications do not implement TCP themselves. Instead it&#39;s
built into the operating system, specifically in what&#39;s
called the operating system &lt;em&gt;kernel&lt;/em&gt;, i.e., the
piece of the OS that&#39;s always running and is responsible
for managing the computer as a whole.
The client application tells the operating system to
create a TCP connection to the server, which creates what&#39;s
called &amp;quot;socket&amp;quot; on the client side. The client writes data
to the socket and the kernel automatically packages
it up into TCP segments and transmits it to the other side,
taking care of retransmission, rate control, etc.
The kernel also reads TCP segments from the other side and makes
them available to the application to read. Typically, the
application implements TLS itself or more likely, uses
some existing TLS library.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;why-can&#39;t-you-write-your-own-tcp-stack%3F&quot;&gt;Why can&#39;t you write your own TCP stack? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#why-can&#39;t-you-write-your-own-tcp-stack%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Obviously, you &lt;em&gt;can&lt;/em&gt; write your own TCP stack (it&#39;s just software,
after all) but the problem is that you can&#39;t &lt;em&gt;install&lt;/em&gt; it,
because on most operating systems, ordinary applications aren&#39;t allowed to write or receive raw IP
datagrams. This is one of a number of restrictions on networking
behavior that used to be used for security enforcement in
a pre-cryptographic era. For instance, at one time it was
assumed that if a packet came from a given machine address
with a given &amp;quot;port number&amp;quot; (a field in the UDP/TCP header)
it came from a privileged process (one that had operating
systems privileges). There was even a whole &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Remote_Shell&amp;amp;oldid=1070274903&quot;&gt;system for remote login&lt;/a&gt; based on this where you could be on machine A
and execute commands on machine B without authenticating.
I know this sounds absurd now, but this was the situation
from the early 80s to the late 90s, when we finally
got proper cryptographic authentication (at least some
of the time.)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is convenient in that the application doesn&#39;t need to carry
around its own TCP implementation, but inconvenient in that
it&#39;s inflexible: suppose the application wants to make some
change to TCP to make it more efficient? There&#39;s no way to
do this without changing the operating system. By contrast,
it&#39;s easy to change TLS behavior just by shipping a new
version of the application. This became particularly salient
in the late 2010s when people wanted to make performance
enhancements to TCP but were unable to because the operating
system didn&#39;t move fast enough.
The solution was to invent a new protocol that could be
implemented entirely in the application: &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=QUIC&amp;amp;oldid=1114290192&quot;&gt;QUIC&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;QUIC is sort of like a combination of a fancier version of
TCP and the cryptography of TLS (in fact, it uses many pieces
of TLS internally). However, because it can be
implemented entirely in the application, it can be
changed very rapidly. Unfortunately, in most operating systems,
applications are not allowed to write IP packets directly,
and so QUIC runs over a protocol called the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=User_Datagram_Protocol&amp;amp;oldid=1112673995&quot;&gt;User Datagram Protocol (UDP)&lt;/a&gt;. UDP is
a very simple protocol which just lets applications send
single units of data (datagrams) over IP. So, QUIC runs
over UDP and UDP runs over IP.&lt;/p&gt;
&lt;h3 id=&quot;the-protocol-stack&quot;&gt;The protocol stack &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#the-protocol-stack&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s conventional to talk about this as a &amp;quot;stack&amp;quot; of protocols
and visualize it in a picture called a &amp;quot;layer diagram&amp;quot;,
like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/TCPIP-layer.png&quot; alt=&quot;TCP/IP Layer Diagram&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I&#39;ve also drawn on this diagram which pieces are implemented
in the application and which are typically part of the operating
system. When the application wants to write
data, it starts at the top of the stack and data moves down to
the network. As data comes in from the network, it moves up the
stack towards the application.&lt;/p&gt;
&lt;p&gt;In terms of the way the data appears on the network, each
layer adds its own encapsulation, typically either before
or after the data. The diagram below shows two examples.
The first is data being sent over TCP, in this case the string
&amp;quot;Four score and seven years ago&amp;quot;. TCP adds its own header
with the sequence number, etc. and then passes it to the
IP layer, which adds the IP header with the source and destination
addresses.
The second example is the same data being sent over TLS.
The TLS layer encrypts the data (shown by the crosshatching)
and adds its own header. It then passes it to TCP, which adds
its own header, etc.
The receiving process reverses these operations.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tcp-tls-packets.png&quot; alt=&quot;TCP and TLS packets&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;naming-chunks-of-data&quot;&gt;Naming chunks of data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#naming-chunks-of-data&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;You&#39;ll probably notice that I&#39;ve been using the terms
&amp;quot;packet&amp;quot;, &amp;quot;record&amp;quot;, etc. These are not interchangeable.
One of the most annoying problems in networking is how
to name a single unit of data like a packet (sometimes
called generically a &lt;em&gt;protocol data unit (PDU)&lt;/em&gt;). Each protocol
tends to have its own term for this, partly just due to
being defined by different people and partly because when
you are working at multiple layers of the protocol stack
it&#39;s a pain to talk about &amp;quot;IP datagrams&amp;quot;, &amp;quot;UDP datagrams&amp;quot;, etc.
Here&#39;s my incomplete table of names for PDUs in different
protocols:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Protocol&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Name&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Ethernet&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Frame&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;IP&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Packet (datagram)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;UDP&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Datagram&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;TCP&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Segment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;TLS&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Record&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;QUIC&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Packet (but it has things inside it called frames)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;HTTP&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;RTP&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Packet (but they carry media frames)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;OpenPGP&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Packet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;XMPP&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Stanza&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;One thing that&#39;s important to know is that TCP and
TLS provide the abstraction of a stream of data, not
a set of records. What this means is that the application
just writes data and the TLS stack or the TCP stack
coalesces those chunks into one record (packet) or
breaks them up at its convenience. The TCP stack
might even send the same data twice with two different
framings. For instance, suppose that the application
writes &amp;quot;Hello&amp;quot; and then the kernel sends it in a single
packet. While the packet is in flight, the application
writes &amp;quot;Again&amp;quot;. If both packets get lost, and the kernel
kernel has to retransmit them, it might write them as
a single TCP segment (&amp;quot;HelloAgain&amp;quot;).&lt;/p&gt;
&lt;h3 id=&quot;which-layer&quot;&gt;Which Layer &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#which-layer&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;With this as background, we are ready to talk about one of the
big points of diversity: what layer are we relaying the traffic
at? There are two main options, at least for relaying encrypted
traffic.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Relay the IP-layer traffic&lt;/li&gt;
&lt;li&gt;Relay the application layer traffic (i.e., the data that would
go over UDP or TCP)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I cover both of these below.&lt;/p&gt;
&lt;h4 id=&quot;relaying-ip-traffic&quot;&gt;Relaying IP Traffic &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#relaying-ip-traffic&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Encrypting traffic at the network layer (IP) is one of the obvious
ways to address network security issues, as it has the important
advantage that once you have set it up, it secures &lt;em&gt;all&lt;/em&gt;
communications between two endpoints. Work on this goes all
the way back to the 1970s, but the IETF started standardizing
technology for this purpose in 1992 under the name
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=IPsec&amp;amp;oldid=1115277463&quot;&gt;IPsec&lt;/a&gt;.
The original idea was actually not so much the kind of relaying
system that I discussed above but rather that you would
encrypt traffic between the two machines that were
communicating with each other. So, for instance, say my client
wanted to communicate with your server, we would take the
IP packets we wanted to send, encrypt them, and send them directly.&lt;/p&gt;
&lt;p&gt;Like the protocols we discussed above, IPsec is an &lt;em&gt;encapsulation&lt;/em&gt;
protocol, which means that to encrypt an IP packet from A to B
we take the entire original packet, encrypt it,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; and then stuff
it in another IP packet, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ipsec1.png&quot; alt=&quot;IPsec encapsulation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In the scenario I was discussing above, the inner (encrypted)
IP header and the outer (plaintext) IP header will have the same
addressing information, but it&#39;s of course possible to have them
have different addressing information, which is useful for creating
what&#39;s called a &lt;em&gt;Virtual Private Network (VPN)&lt;/em&gt;. The motivating
idea here is that you have two networks (say two offices from
the same company) and you want to connect them as if they were
in the same location. Inside the office, you trust that the
wires haven&#39;t been tampered with (this is before WiFi)
and so you don&#39;t encrypt all your data (I know, this sounds
naive now), and so what you really want is just a wire
connecting office 1 and office 2. This kind of private
connection—what used to be called a &amp;quot;leased line&amp;quot;—is very expensive
to buy and what you actually have is an Internet connection which
lets you connect to everyone. But if you encrypt the traffic
between office 1 and office 2, then you can simulate having
your own private wire. Hence &lt;em&gt;virtual&lt;/em&gt; private network. The
typical topology looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/enterprise-vpn.png&quot; alt=&quot;An enterprise VPN&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this scenario, you have two offices, each of which has a
&amp;quot;VPN gateway&amp;quot; which detects traffic that is destined from office
1 to office 2 and encrypts it before sending it along. Other
traffic, say to Facebook, is left untouched. When the
packets are received at the far VPN gateway, it just removes
the encapsulation and drops them on the network. The effect is
as if there were a single network rather than two networks.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible to deploy this kind of thing in a simpler
scenario where a single user VPNs into their office network,
for instance if you are in a hotel working remotely, as shown
in the diagram below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/remote-access-vpn.png&quot; alt=&quot;A remote access VPN&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The effect here is that it&#39;s like you were in the office, but you&#39;re
actually not. But this brings up a real problem, which is that the
remote user&#39;s machine doesn&#39;t have the right IP address:
it has an IP address associated with the user&#39;s home or office (192.0.2.1 in the
diagram above) but you want it to appear to be in the office, which
means it has to have an office IP address (something starting
with 203.0.112).&lt;/p&gt;
&lt;p&gt;There are two major ways to make this work. In the first, the
VPN gateway tells the user&#39;s device what IP address it wants
it to have, and then the user&#39;s device puts that in the &lt;em&gt;inner&lt;/em&gt;
IP header, while having the outer IP header having the actual
address. For instance, the inner (encrypted) IP header would
have 203.0.11.50 and the outer (plaintext) IP header would
have 192.0.2.1.
The alternative is to have both headers have the user&#39;s
actual IP address and to have the VPN gateway &lt;em&gt;translate&lt;/em&gt;
that address into an appropriate local address for the
office network (and translate in the other way on the return trip). Note that in both cases, the gateway
needs to do some work, in the first case to keep track of
what addresses were assigned and to enforce that the client
uses the right one, and in the second case to do the translation.&lt;/p&gt;
&lt;p&gt;With that background, we can finally get to the problem statement
that we started with, namely concealing user behavior. Unsurprisingly
you can use the same technology as you use for remote access, with
the difference that the VPN gateway is on the Internet directly
rather than on some enterprise network, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/consumer-vpn.png&quot; alt=&quot;Consumer VPN&quot; /&gt;&lt;/p&gt;
&lt;p&gt;To the server, this just looks like the user is connecting from
the VPN gateway, with whatever the IP address of the VPN gateway
is. The client&#39;s local network just sees a connection to the
VPN server, but doesn&#39;t know where the data is eventually going.&lt;/p&gt;
&lt;p&gt;Here I&#39;ve focused on IPsec, but it doesn&#39;t really matter which
encryption layer protocol you use to carry the IP packets:
they&#39;re just being encapsulated and transported end-to-end.
In practice, one sees VPNs deployed with a variety of
transport protocols, including
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Datagram_Transport_Layer_Security&amp;amp;oldid=1115742845&quot;&gt;DTLS&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=OpenVPN&amp;amp;oldid=1107224624&quot;&gt;OpenVPN&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=WireGuard&amp;amp;oldid=1113944737&quot;&gt;WireGuard&lt;/a&gt;
and &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=QUIC&amp;amp;oldid=1114290192&quot;&gt;QUIC&lt;/a&gt;.
From the user&#39;s perspective, the properties of these protocols are largely
the same.
Most products that are labeled &amp;quot;VPN&amp;quot; protect traffic at the IP
layer using one or more of these protocols.&lt;/p&gt;
&lt;h4 id=&quot;relaying-application-layer-traffic&quot;&gt;Relaying Application Layer Traffic &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#relaying-application-layer-traffic&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As mentioned above, the nice thing about protecting traffic at the
IP layer is that it protects all the traffic on the system. However,
the bad thing is that protecting
IP layer traffic requires cooperation from the operating system.
This has several undesirable consequences:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Your code isn&#39;t portable between operating systems.&lt;/li&gt;
&lt;li&gt;Many operating systems require some kind of administrator
access in order to install or configure something that
acts at the IP layer.&lt;/li&gt;
&lt;li&gt;You are often limited to whatever affordances the OS
offers you. For instance, you may not easily be able to
protect some traffic and not other types of traffic.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These issues can be addressed by relaying at the application
layer rather than the IP layer. This can be implemented entirely
in the application without touching the operating system;
the application just connects to the relay (e.g., over TCP)
and sends the traffic to the relay (hopefully encrypted to the
server). The relay makes its own transport-level connection
to the server and sends the application level traffic to the
server, as shown below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/application-relay.png&quot; alt=&quot;Application Level Relaying&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Note that in this diagram there are two TCP connections,
one between the client and the relay and one between
the relay and the server. The client connects to the relay over
TLS and then over top of that creates an end-to-end TLS
connection to the server (you could of course not encrypt
your data to the server, but don&#39;t do that).&lt;/p&gt;
&lt;p&gt;One of the big advantages of this design is that it makes it
easy to relay some kinds of traffic and not others. As a concrete
example, consider &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing&quot;&gt;Safe Browsing&lt;/a&gt;, which
leaks information about the user&#39;s browsing history to the
Safe Browsing server. You might want to proxy Safe Browsing
checks (which can be done very cheaply because there isn&#39;t
much traffic) but not generic browsing traffic (which is
much higher volume and hence more expensive). This is easy
for the browser to do because it knows which traffic is which
but is more difficult for an IP-layer system, which has to
somehow distinguish different types of traffic. It&#39;s not
necessarily impossible but it&#39;s significantly more work.
For instance, if Safe Browsing uses a separate IP address
from the rest of Google, then you could just relay that
traffic, but if it shares the same IP address, then you
will be encrypting people&#39;s search traffic as well.&lt;/p&gt;
&lt;p&gt;A number of IP concealment systems relay at the application
layer, including &lt;a href=&quot;https://torproject.org/&quot;&gt;Tor&lt;/a&gt;,
Apple&#39;s &lt;a href=&quot;https://support.apple.com/en-us/HT212614&quot;&gt;iCloud Private Relay&lt;/a&gt;,
and &lt;a href=&quot;https://fpn.firefox.com/&quot;&gt;Firefox Private Network&lt;/a&gt;.
Typically, systems like this are referred to as &amp;quot;proxies&amp;quot;.
Apple&#39;s system is interesting in that it&#39;s implemented
in the operating system mostly by hooking Apple&#39;s higher
level networking APIs. Even so, it only works on Safari not
other applications.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;how-many-hops%3F&quot;&gt;How many hops? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#how-many-hops%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Whatever the relaying technology, at the end of the day the
relay needs to send traffic to the server, which means it
has to know what server you&#39;re connecting to. But this
creates a new privacy problem: you&#39;re connecting to the
relay and then telling it which server to connect to.
This means that while you&#39;ve prevented the server from
learning your identity, you still have a privacy problem
with respect to the relay itself. The
relay will have some privacy policy about how it handles
this information (ideally, not keeping logs at all),
but that&#39;s just something you have to trust them on.
Even better would be to have some form of a technical protection.&lt;/p&gt;
&lt;p&gt;The standard approach to providing technical protection here is to have multiple layers
of relaying, as shown in the diagram below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/multi-hop-relay.png&quot; alt=&quot;A multi-hop relay system&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The way this works is that the client connects to Relay 1.
It then tells Relay 1 to connect it to Relay 2. As with
our single-hop system, that data is sent over the encrypted
channel to Relay 1 and is itself encrypted to Relay 2.
The client then tells Relay 2 to connect it to the server.
The data to the server is thus encrypted three times by
the client, in a nested fashion: once to the server, then
to Relay 2, and then to Relay 1.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
Each hop strips off one
layer of encryption and passes it to the next hop.&lt;/p&gt;
&lt;p&gt;The result is that no single entity (other than the client)
gets to see &lt;em&gt;both&lt;/em&gt;
the user&#39;s identity and the identity of the server it&#39;s
connecting to. Here&#39;s what each sees:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Entity&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Knowledge&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Relay 1&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Client address, Relay 2 address&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Relay 2&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Relay 1 address, Server address&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Server&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Relay 2 address, Server address&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note that if the two relays collude, they can together
uncover the client&#39;s address and the server&#39;s address. However,
if either is honest, then the client&#39;s privacy should be
protected, as
neither can easily collude with the server to learn this
information:
relay 2 because it does not know the client&#39;s address
and relay 1 because (hopefully) the client&#39;s connection
to relay 2 is one of many connections it has made to
relay 2 during this time period. How well this last part
works depends on the scale of operation of the system,
how long the client leaves the connection up,
whether it reuses the connection to relay 2 for
connections to multiple servers, etc.&lt;/p&gt;
&lt;p&gt;Of course, in order for this to work, the relays need
to be operated by different entities.
Otherwise there&#39;s no meaningful guarantee of non-collusion.
This includes
not being run on the same cloud service provider
(e.g., AWS).
Sometimes you&#39;ll hear about &lt;a href=&quot;https://www.comparitech.com/blog/vpn-privacy/multi-hop-vpn/&quot;&gt;multi-hop VPNs&lt;/a&gt; but
if the same company is providing both VPN servers, then
this doesn&#39;t really help. One nice feature of iCloud
Private Relay is that your account is with Apple but they
arrange for multiple hops with different providers, so
you don&#39;t need to worry about the details.&lt;/p&gt;
&lt;p&gt;One important limitation of multiple hops is that it
can have a negative impact on performance. In general,
the routing algorithms that run the Internet try to
find a reasonably efficient route between two locations&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
and so you should expect that if instead of routing between
point A and point B you route from A to C to B, then this
will be somewhat slower (you&#39;ll often hear people use
the term &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Triangle_inequality&amp;amp;oldid=1100229250&quot;&gt;triangle inequality&lt;/a&gt;
as shorthand for this). The more hops you do, the more
likely it is you will have some kind of performance impact.
This isn&#39;t a precise effect, but in general, you should expect to have some
impact.&lt;/p&gt;
&lt;p&gt;iCloud Private Relay is a &lt;a href=&quot;https://support.apple.com/en-us/HT212614&quot;&gt;two hop network&lt;/a&gt;, with the
first hope being operated by Apple and the second
hop being a large provider that Apple has contracted with
(mostly Content Delivery Networks (CDN) like Cloudflare or Akamai).
Both Apple and these CDNs have fast connectivity and good
geographic distribution, which is intended to ensure
high performance. Tor uses &lt;a href=&quot;https://support.torproject.org/glossary/circuit/&quot;&gt;three hops&lt;/a&gt;,
a &amp;quot;guard node&amp;quot;, a &amp;quot;middle relay&amp;quot; and an &amp;quot;exit node&amp;quot;. As discussed below,
Tor relays are effectively volunteer services, so performance varies in
practice.&lt;/p&gt;
&lt;h3 id=&quot;business-model&quot;&gt;Business Model &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#business-model&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Your typical VPN has a simple business model: you pay the VPN provider
and then authenticate to them (e.g., with a password) when you connect.
This isn&#39;t ideal for privacy because they know your name, contact
information, and credit card number.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
On the other hand, as described above, they already know your IP address and which
sites you&#39;re going to, so it&#39;s not clear how much worse this makes things.&lt;/p&gt;
&lt;p&gt;With Private Relay, however, this would create a real problem:
it&#39;s not so bad with the first hop relay because that gets your IP
address anyway, but if you authenticate to the second relay with
your identity, then you&#39;ve ruined everything and you might as well
be back with a single hop system. In order to address this problem,
Apple uses anonymous credentials generated using &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Blind_signature&amp;amp;oldid=1088007463&quot;&gt;blind signatures&lt;/a&gt; to authenticate to the proxy,
as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/anonymous-proxy.png&quot; alt=&quot;Anonymous Authentication&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Briefly, the way this works is that the client connects to
Apple and authenticates to it using its iCloud account.
Apple then issues an anonymous credential that doesn&#39;t
contain the user&#39;s identity. This credential can
be provided to the relay to authorize use of the service.
In order to prevent Apple from linking up these two activities
the credential is &lt;em&gt;blinded&lt;/em&gt; (essentially encrypted)
when Apple generates it, and then the client unblinds
it before sending it to the relay
(see &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#digression%3A-anonymous-credentials&quot;&gt;here&lt;/a&gt; for more
detail on how this kind of credential works).
This design allows the
proxy to know that you are authorized to use the service but
not to see who you are.&lt;/p&gt;
&lt;p&gt;Tor is different from either of these because it&#39;s a free service,
operated by members of the community (you can &lt;a href=&quot;https://support.torproject.org/faq/relay-donations/&quot;&gt;donate&lt;/a&gt;
to people who run relays). This creates some unpredictable
performance consequences because there really isn&#39;t much
in the way of a &lt;em&gt;Service Level Agreement (SLA)&lt;/em&gt;. It also
makes it somewhat hard to assess the actual privacy guarantees,
because some of the Tor nodes might be run by people you don&#39;t
trust or who are actively malicious. Obviously, with iCloud Private Relay
you have to judge for yourself how much you trust Apple and its partners,
but at least you have some idea who they are.&lt;/p&gt;
&lt;h2 id=&quot;summary-and-final-thoughts&quot;&gt;Summary and Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#summary-and-final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;IP addresses are an important and highly effective tracking vector and
if you want to browse privately you need to do something to conceal
your IP, and this mostly means relaying. Any relaying system
will conceal your identity from the server, as long as your
provider isn&#39;t colluding with the server.
Any one hop system necessarily means that you are trusting
the provider not to track your behavior and not to collude
with the server. Depending on how you feel about your local network
and its privacy policies, a single hop system might or might not
be an improvement (see Yael Grauer&#39;s &lt;a href=&quot;https://www.consumerreports.org/vpn-services/mullvad-ivpn-mozilla-vpn-top-consumer-reports-vpn-testing-a9588707317/&quot;&gt;article in Consumer Reports&lt;/a&gt; for more on this). A multi-hop
system has a much better privacy story because misbehavior by
a single relay is not sufficient to compromise your privacy.&lt;/p&gt;
&lt;p&gt;The technical details of how the system works (IP versus application
layer, mostly) don&#39;t matter that much for privacy but do matter
for functionality, with application layer systems being more
flexible but providing less complete coverage for other applications
on your device. In addition, all of the multi-hop systems that I know
are at the application layer, so as a practical matter if you
want a multi-hop system you probably will be using an application
layer system.&lt;/p&gt;
&lt;p&gt;Finally, it&#39;s important to know that even the best system
provides only limited protection. An attacker who has a complete
view of the network can often do enough traffic analysis to
determine who is on each end of the traffic. Fortunately, most
of us do not need to worry about this powerful an attacker.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Some people run their own relays, in which case they
might successfully conceal their identity, but
because they will be the only user, they&#39;ll be trackable
by the IP of that relay. &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The way this works is that when you are sending too
quickly, packets get dropped by the network, so the
sender can use the rate of loss as a signal that its
sending rate is too high. &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, I&#39;m ignoring &amp;quot;transport mode&amp;quot;, in which you just carry
the UDP or TCP datagram. &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This includes other browsers on iOS even though those
browsers are required to use Apple&#39;s WebKit engine.
As far as I can tell, this is just a policy choice
on Apple&#39;s side, not any kind of technical limitation. &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You&#39;ll sometimes hear the term &amp;quot;onion routing&amp;quot; applied
to this, especially with Tor. &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is a hideously complicated topic all on its own. &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, you could pay with Bitcoin but &lt;a href=&quot;https://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf&quot;&gt;don&#39;t think that&#39;s private&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/traffic-relaying/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Self-Driving Vehicles, Monoculture, and You</title>
		<link href="https://educatedguesswork.org/posts/self-driving-bloomberg/"/>
		<updated>2022-10-10T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/self-driving-bloomberg/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;Warning: this post didn&#39;t come out quite as tight as I was hoping.
I think there are a bunch of interesting ideas and connections
to be drawn, but they don&#39;t hang together as well as I wanted.
That said, I&#39;m not quite sure how to improve things, and so
I&#39;m just going to post it as-is. The Internet has plenty of
bits, after all.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Max Chafkin&#39;s &lt;a href=&quot;https://www.bloomberg.com/news/features/2022-10-06/even-after-100-billion-self-driving-cars-are-going-nowhere&quot;&gt;article&lt;/a&gt; arguing that self-driving cars are
failing is making the rounds, especially this amazing opening
bit:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The first car woke Jennifer King at 2 a.m. with a loud, high‑pitched
hum. “It sounded like a hovercraft,” she says, and that wasn’t the
weird part. King lives on a dead-end street at the edge of the
Presidio, a 1,500-acre park in San Francisco where through traffic
isn’t a thing. Outside she saw a white Jaguar SUV backing out of her
driveway. It had what looked like a giant fan on its roof—a laser
sensor—and bore the logo of Google’s driverless car division, Waymo.&lt;/p&gt;
&lt;p&gt;She was observing what looked like a glitch in the self-driving
software: The car seemed to be using her property to execute a
three-point turn. This would’ve been no biggie, she says, if it had
happened once. But dozens of Google cars began doing the exact
thing, many times, every single day.&lt;/p&gt;
&lt;p&gt;King complained to Google that the cars were driving her nuts, but
the K-turns kept coming. Sometimes a few of the SUVs would show up
at the same time and form a little line, like an army of zombie
driver’s-ed students. The whole thing went on for weeks until last
October, when King called the local CBS affiliate and a news crew
broadcast the scene. “It is kind of funny when you watch it,” the
report began. “And the neighbors are certainly noticing.” Soon after, King’s driveway was hers again.&lt;/p&gt;
&lt;p&gt;Waymo disputes that its tech failed and said in a statement that its
vehicles had been “obeying the same road rules that any car is
required to follow.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here&#39;s the thing, though: Waymo is &lt;strong&gt;right&lt;/strong&gt;. It wouldn&#39;t be a big
deal if just the occasional person did a K-turn in King&#39;s driveway
(who among us hasn&#39;t turned around in someone&#39;s driveway?), but when
&lt;em&gt;everyone&lt;/em&gt; does it, then it&#39;s a disaster, as least for King. However, it&#39;s a little
harder to pinpoint exactly what&#39;s wrong here.&lt;/p&gt;
&lt;p&gt;There&#39;s an obvious account of this situation, which is that this is a
case of AI risk, incentive alignment, and the famous &lt;a href=&quot;https://www.lesswrong.com/tag/paperclip-maximizer&quot;&gt;paperclip
optimizer&lt;/a&gt;.  In
this version of the story, Google&#39;s system for training their cars
is only interested in saving time (or wear on the cars, or whatever),
doesn&#39;t take into account the &lt;em&gt;externalities&lt;/em&gt; of their behavior, so
it&#39;s perfectly happy to keep people up all night with car noise if it
saves a few seconds or minutes.&lt;/p&gt;
&lt;p&gt;There certainly is some kind of alignment problem here, but I
think this analysis doesn&#39;t quite capture it. As I said above,
the problem isn&#39;t that any particular car does a K-turn in
King&#39;s driveway, but that all of them do. Even if we ignore externalities,
it&#39;s not clear that this is an optimal solution:
according to the story there were cars lining up to make this
turn, at which point you should be wondering if this really
is the fastest way for them to accomplish their objective.
This suggests another analysis, which is that this is a locally
optimal approach which isn&#39;t globally optimal, even if we
ignore externalities.&lt;/p&gt;
&lt;p&gt;This shouldn&#39;t be an unfamiliar concept: there are lots of things
which work at a small scale but not at a large scale. There are at
least two possible failure modes that one can encounter:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;This just isn&#39;t scalable at all&lt;/li&gt;
&lt;li&gt;You need some diversity&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;unsustainable-scaling&quot;&gt;Unsustainable Scaling &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#unsustainable-scaling&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Most people are used to systems that have unsustainable scaling.
Sometimes this is due to externalities, such as with air pollution.
Back when only a few people had cars, it didn&#39;t really matter that a
typical internal combusion engine emitted way too much NO&lt;sub&gt;x&lt;/sub&gt;,
but put enough cars on the road and you get &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Acid_rain&amp;amp;oldid=1106116768&quot;&gt;acid rain&lt;/a&gt;, hence &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Catalytic_converter&amp;amp;oldid=1113894878&quot;&gt;catalytic converters&lt;/a&gt;. The situation with CO&lt;sub&gt;2&lt;/sub&gt; and climate change
is similar: we can only dump so much into the atmosphere before
whatever homeostasis there is starts to break down.&lt;/p&gt;
&lt;p&gt;Other cases of unsustainable scaling aren&#39;t so much due to externalities
as due to resource constraints. We saw that early in the COVID
pandemic, where we had really effective COVID tests based on &lt;a href=&quot;https://educatedguesswork.org/posts/pcr&quot;&gt;PCR&lt;/a&gt;
but there were only a few labs that could do them. Those
tests have become more standardized, but we also now have cheap
lateral flow tests that scale. I understand that this is also a
problem in educational interventions, which often seem to work
in pilot projects with teachers who are committed to the idea but don&#39;t
scale well when you need every teacher to do it.&lt;/p&gt;
&lt;h2 id=&quot;the-need-for-diversity&quot;&gt;The need for diversity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#the-need-for-diversity&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Another possibility is that you actually do have something scalable,
as long as not everyone tries to do exactly the same thing.
It might be the case that there are hundreds of little hacks like this, and
if only a few cars used each of them, it would be fine, so you just
need diversity rather than uniformity.
The common example of this is of course monoculture in crops, though
you actually can get very high yields this way, but you end up with a
brittle system. However, there are also situations in which
the whole system falls apart if you don&#39;t have some diversity.&lt;/p&gt;
&lt;p&gt;This is a familiar concept in networking, where, like above,
you often have some resource that needs to be shared between
multiple agents and if they don&#39;t share nicely, everything
collapses.&lt;/p&gt;
&lt;h3 id=&quot;avalanche-restart&quot;&gt;Avalanche Restart &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#avalanche-restart&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One well-known case is what&#39;s called &amp;quot;avalanche restart&amp;quot;.
Suppose that you have a server that is under heavy load
(i.e., has a lot of clients) and then for some reason it
reboots.&lt;/p&gt;
&lt;p&gt;Of course, this is experienced by clients as a failure, and they try
to reconnect. The obvious thing to do is to try to reconnect
immediately and if that fails try again (i.e., in what&#39;s called a
&amp;quot;tight loop&amp;quot;). This is locally optimal, because it lets you reconnect
quickly, but globally bad: if everyone does this, however, what often
happens is that you can overload the server or the network that the
server is on, which leads to bad service for everyone as it tries to
switch between every client and might even cause it to reboot again
(this shouldn&#39;t happen, but all software has bugs.)&lt;/p&gt;
&lt;p&gt;There are two standard techniques to address this problem:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Instead of having the clients retry immediately, have them wait
a random time (e.g., between 1 and 10 seconds).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the client fails to connect, then it increases (typically,
doubles) the amount of time it waits before the next retry.
This is called &amp;quot;exponential backoff&amp;quot;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Typically, these are used together, so you randomly start and then
exponentially back off. The net effect is that you don&#39;t have every
client trying at the same time, and the rate of clients attempting
automatically adjusts until the server isn&#39;t overloaded.&lt;/p&gt;
&lt;p&gt;Obviously, this isn&#39;t locally optimal: if the server has very few
clients it would be better if the clients just reconnected immediately.
Moreover, if everyone else following the random start + exponential
backoff approach, then it&#39;s obviously advantageous for a single client
to just try to reconnect aggressively (to &amp;quot;defect&amp;quot; in the game theory
jargon). But if everyone defects, then the result is that the server
is over capacity and most people get terrible service. The point
here is that it&#39;s better for everyone to do something slightly suboptimal
&lt;em&gt;but different&lt;/em&gt; than it is for everyone to do the same thing, even if
it&#39;s locally optimal.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;nicad-battery-memory&quot;&gt;NiCad Battery Memory &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#nicad-battery-memory&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;I had originally been intending to write about the famous
Nickel Cadmium battery &amp;quot;memory&amp;quot; phenomonen.
The way the story is usually told is that there was
a satellite that was powered by solar panels and used
NiCad batteries to store energy during periods when the
the panels weren&#39;t illuminated (due to the Earth
being in the way of the sun). Because the orbit is very
regular—and there&#39;s no weather in space—the
battery was charged and discharged on a repeating regular
schedule. Eventually, it started exhibiting decreased
storage at the point where it would usually start
being charged. However, attempts to reproduce this
phenomenon seems to have been &lt;a href=&quot;https://batteryguy.com/kb/knowledge-base/the-nickel-cadmium-memory-effect-fact-or-fiction/&quot;&gt;mixed&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;network-transmission-rate-control&quot;&gt;Network Transmission Rate Control &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#network-transmission-rate-control&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A similar situation occurs with network rate control. A good example
is the classic &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Ethernet&amp;amp;oldid=1114690494&quot;&gt;Ethernet&lt;/a&gt;
local area network. In original Ethernet, every computer was on the
same wire and so whatever you send is received by every other computer
and vice versa, just like a radio network. But two computers can&#39;t
transmit at the same time because they will step on each other. The
question then becomes how to divide up the time.&lt;/p&gt;
&lt;p&gt;One way to address this problem is to have defined time slices during
which each node can transmit, but this requires tightly coordinated
clocks and doesn&#39;t adapt well if one node wants to transmit a lot
and the others want to listen. Instead, Ethernet solves this problem
by having each node transmit as soon as it has something to send and no
other node is transmitting, but it
also detects if another node also chooses the same time to start
(a &amp;quot;collision&amp;quot;). If there is a collision,
each node picks a random amount of time to wait before it tries
to start transmitting again.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This way, the chance of a repeated collision is relatively low.
Obviously, it would be better for each node to retransmit right
away, but if everyone does that you will just get collisions again.&lt;/p&gt;
&lt;p&gt;Here too, you get a more globally optimal result if everyone does something
that&#39;s locally suboptimal.&lt;/p&gt;
&lt;h2 id=&quot;some-other-potential-cases&quot;&gt;Some other potential cases &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#some-other-potential-cases&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I&#39;m not trying to suggest that this is some brilliant insight, but
nevertheless it&#39;s an effect we see surprisingly often. Some other examples
of similar phenomena:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.thewisetraveller.com/Articles/view/?permalink=is-instagram-ruining-travel&quot;&gt;Complaints&lt;/a&gt;
that because of Instagram everyone goes to the same places for
vacation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Heavy congestion on popular hiking and running trails because everyone
wants to do Rae Lakes, JMT, etc. and they&#39;ve had to institute a quota
system, even though there are lots of great trails that are basically
empty. Pro Tip: quotas only apply to camping, so if you can trail
run it in one day you can do anything.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Congestion on &amp;quot;alternate&amp;quot; routes that avoid rush hour traffic on
the major arterials. This is a similar case because it would be
fine if just a few people did, it but we can&#39;t have everyone
driving through downtown Palo Alto to get from 101 to 280.
We see this some organically but I&#39;ve often wondered if traffic
sensitive navigation systems like Waze and Google Maps that
reroute you to alternate routes make efforts not to send everyone
there.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&#39;s also a whole game theory literature on what&#39;s
called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Strategy_(game_theory)&amp;amp;oldid=1103817825#Mized_strategy&quot;&gt;mixed strategies&lt;/a&gt; which is in part about how it&#39;s often better
to play a mix of multiple strategies rather than a
single uniform one. There&#39;s a connection here to the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Tragedy_of_the_commons&amp;amp;oldid=1114943684&quot;&gt;tragedy of the commons&lt;/a&gt; (and of course to &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Prisoner%27s_dilemma&amp;amp;oldid=1112692700&quot;&gt;Prisoner&#39;s dilemma&lt;/a&gt;) as well.&lt;/p&gt;
&lt;h2 id=&quot;coordination&quot;&gt;Coordination &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#coordination&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said, this is a pretty common problem, but it can be pretty hard
to address when you have a bunch of individual agents all making their
own decisions. Above, I&#39;ve mostly talked about how each agent has an
incentive to defect and get a locally optimal solution, even if it&#39;s
not globally optimal, but even if every agent plays by the rules, it
can still be vary hard to design a system that produces the right
result.&lt;/p&gt;
&lt;p&gt;As a concrete example, early implementations of the &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc9293&quot;&gt;TCP&lt;/a&gt;
network protocol implemented an algorithm for controlling the transmission
rate that could fail catastrophically, resulting in what&#39;s called
&amp;quot;congestion collapse&amp;quot;, in which the network was entirely full
of traffic, but it was mostly retransmitted data and almost
no real progress was being made (Van Jacobsen and Karels
have an approachable &lt;a href=&quot;http://ee.lbl.gov/papers/congavoid.pdf&quot;&gt;account&lt;/a&gt;
of what happened and the fix). The problem of designing rate control algorithms
that perform well but don&#39;t result in congestion collapse has occupied
network engineers ever since. The fundamental problem here is the
lack of a centralized point of view and control, instead each agent
has to make its own decision independentally, and designing
an efficient algorithm is hard.&lt;/p&gt;
&lt;p&gt;This is actually the part I find a bit puzzling about the whole
Waymo thing: surely the Waymo engineers know about this general
phenomenon and they &lt;em&gt;do&lt;/em&gt; have an overall view of what&#39;s happening,
so it would be natural to put in some sort of throttling system
so that not every car tries the same hack at once, or even
to detect congestion in real time. Do they not
have a system like this? Is this still the optimal algorithm
in terms of car time, even though it&#39;s annoying for homeowners?
Something else? Waymo people, my DMs are open!&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is also an exponential backoff component here in case
of another collision. &lt;a href=&quot;https://educatedguesswork.org/posts/self-driving-bloomberg/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>On the Security and Privacy Properties of Public WiFi</title>
		<link href="https://educatedguesswork.org/posts/public-wifi/"/>
		<updated>2022-09-25T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/public-wifi/</id>
		<content type="html">&lt;p&gt;One of the most common security and privacy questions I get is whether
it&#39;s safe to use public WiFi networks (and whether you should
use a VPN). The answer is &amp;quot;it depends&amp;quot;, for the reasons I lay
out below. If you want to skip the rest of this, I&#39;ll
tell you that I mostly just use airport and hotel WiFi
but am more hesitant about it if I have to log in with my own identity.&lt;/p&gt;
&lt;p&gt;&amp;quot;Safe&amp;quot; is a difficult word that covers a lot of territory. At a high
level, there three main threats one might be concerned about in
this context:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Compromise of your device (information security)&lt;/li&gt;
&lt;li&gt;Compromise of the data you are transmitting over the network (communications security)&lt;/li&gt;
&lt;li&gt;Monitoring of your use of the network (privacy)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&#39;s take these in turn.&lt;/p&gt;
&lt;h2 id=&quot;compromise-of-your-device&quot;&gt;Compromise of your device &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#compromise-of-your-device&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Often the first thing people worry about is that the network will
be malicious and will subvert your device via some vulnerability
in the browser, the operating system, etc. I&#39;m certainly not going
to tell you that this isn&#39;t possible (all software has
defects, and some of them will be vulnerabilities) but vendors go to a lot of effort to find and
fix these vulnerabilities, so it&#39;s also not a trivial matter
to find them and they&#39;re quite valuable. As a concrete example, at this year&#39;s Pwn2Own
competition, a full compromise of an iPhone 13 or a
Pixel 6 was worth &lt;a href=&quot;https://www.zerodayinitiative.com/blog/2022/8/29/announcing-pwn2own-toronto-2022-and-introducing-the-soho-smashup&quot;&gt;$200,000 USD&lt;/a&gt;, and an extra $50K if you got kernel
access.&lt;/p&gt;
&lt;p&gt;This is not to say that modern devices are somehow impregnable,
but rather that it&#39;s relatively unlikely that an attacker is
going to use a zero-day (i.e., undiscovered) vulnerability to
attack random people at an airport Starbucks. Major OS vendors
(both desktop and mobile) and major browser vendors are pretty
good about quickly fixing vulnerabilities, so if you are running
an up to date browser and an up to date OS, you should be
relatively safe.&lt;/p&gt;
&lt;p&gt;Moreover, even if your local network is safe,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
you still have to worry about compromise by other
network actors, such as the Web sites you visit.
Generally, if your browser and device aren&#39;t secure
against network attack, you should be pretty concerned
about your safety whatever the status of your local network.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: this advice does not apply if you
are someone who is especially likely to be attacked by a powerful
attacker, such as a state-level actor. If you are an activist
or a dissident, you need a totally different level of operational
security that probably involves having several machines.&lt;/p&gt;
&lt;h2 id=&quot;compromise-of-your-communications&quot;&gt;Compromise of your communications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#compromise-of-your-communications&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;http%2C-https%2C-tls%2C-and-quic&quot;&gt;HTTP, HTTPS, TLS, and QUIC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#http%2C-https%2C-tls%2C-and-quic&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Historically, Web encryption used the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Hypertext_Transfer_Protocol&amp;amp;oldid=1111771618&quot;&gt;HTTP&lt;/a&gt; protocol, which ran over a channel
provided by &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transmission_Control_Protocol&amp;amp;oldid=1110294070&quot;&gt;TCP&lt;/a&gt;. When run securely, it was layered
over &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transport_Layer_Security&amp;amp;oldid=1110721112&quot;&gt;TLS&lt;/a&gt;, which sits between HTTP and TCP and provides a secure channel, with the result
being called &amp;quot;HTTPS&amp;quot; (for HTTP Secure). The server
indicates to the client that a given URL was to be retrieved
via HTTPS by giving it a URL starting with &lt;code&gt;https:&lt;/code&gt; rather
than &lt;code&gt;http:&lt;/code&gt;. Recently, the IETF has standardized a new
version of HTTP (called HTTP/3) which runs over a
network protocol called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=QUIC&amp;amp;oldid=1108644156&quot;&gt;QUIC&lt;/a&gt; rather than TCP.
QUIC uses the TLS 1.3 cryptographic handshake and TLS-like encryption, so
HTTP/3 provides a similar set of security properties to earlier
versions of HTTP over TLS. It still uses &lt;code&gt;https:&lt;/code&gt; URLs, and so
it&#39;s convenient to just call it all HTTPS, even though the
protocol is different.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The second potential area of concern is compromise of your
communications. The basic situation here is quite simple:
The operator of the WiFi network can inspect and or
modify every packet you send, so they get to see anything
that&#39;s not encrypted. This actually applies to any network
you use, not just WiFi networks.&lt;/p&gt;
&lt;p&gt;When it comes to Web traffic, the news is generally pretty
good: a very large fraction of Web sites are encrypted
using either &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transport_Layer_Security&amp;amp;oldid=1110721112&quot;&gt;TLS&lt;/a&gt;
or &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=QUIC&amp;amp;oldid=1108644156&quot;&gt;QUIC&lt;/a&gt;.
These protocols were designed under the assumption that
the attacker has full control of the network, and so
provide security even if you are on a malicious
WiFi network.
In general, as long as you are on an encrypted
Web site, you should not need to worry about your passwords,
credit card numbers, etc. And if you&#39;re not an encrypted Web
site, then you probably shouldn&#39;t do anything even if you
are on a trusted WiFi network because you
have to worry about attackers elsewhere on the Internet
between you and the site.&lt;/p&gt;
&lt;p&gt;It&#39;s a little hard to get a precise estimate of the
fraction of traffic that is HTTPS;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
below I show measurements from Chrome and Firefox
respectively, with Chrome showing rather more use
of HTTPS than Firefox does. It&#39;s still not clear
what the source of the difference is, but in any case the pattern
is the same, which is that most traffic is encrypted,
especially in the US, and it&#39;s gradually increasing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/chrome-https-stats.png&quot; alt=&quot;Chrome HTTPS Stats&quot; /&gt;
[&lt;a href=&quot;https://transparencyreport.google.com/https/overview&quot;&gt;Chrome HTTPS data&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/firefox-https-stats.png&quot; alt=&quot;Firefox HTTPS Stats&quot; /&gt;
[&lt;a href=&quot;https://letsencrypt.org/stats/&quot;&gt;Firefox HTTPS data&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;The situation is somewhat worse for mobile apps. In a Web site,
the client-side implementation of encryption is located in the
browser, so the site only needs to configure their own server
correctly—which is fairly standardized, especially if
you use a hosting provider which has built in HTTPS support—and
then send the client &lt;code&gt;https:&lt;/code&gt; URLs. By contrast, mobile
apps have to arrange for their own transport security. Historically
this has led to a lot of apps not doing encryption at all
or doing it in an insecure fashion. The latest work on
this appears to be from &lt;a href=&quot;https://www.usenix.org/conference/usenixsecurity21/presentation/oltrogge&quot;&gt;Oltrogge, Huaman, Amft, Acar, and
Backes&lt;/a&gt;
in 2021, which reports a significant number of vulnerable
Android apps, despite attempts from Google to prevent this.&lt;/p&gt;
&lt;p&gt;Obviously, it&#39;s dangerous to use an app that doesn&#39;t
implement encryption securely on an untrusted network.
A VPN can sort of help here in that it prevents you from
attack by the local network. However, this
is only a partial solution:
even if the last mile is secure
there are hundreds to thousands of miles of network
between you and the server; if the app doesn&#39;t implement
encryption correctly, then you are vulnerable to
attack anywhere along that path. In general, what
you want is for your apps—and web sites—to
encrypt their traffic.&lt;/p&gt;
&lt;h2 id=&quot;monitoring-of-your-use-of-the-network&quot;&gt;Monitoring of your use of the network &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#monitoring-of-your-use-of-the-network&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The really serious problem here is privacy. While HTTPS does
a good job of protecting your actual Web traffic, such as
passwords, credit card numbers, etc., it does not effectively
conceal the sites you are going to.&lt;/p&gt;
&lt;h3 id=&quot;routes-for-browsing-behavior-leakage&quot;&gt;Routes for Browsing Behavior Leakage &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#routes-for-browsing-behavior-leakage&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There are four main
avenues for this leakage (collectively called &amp;quot;metadata&amp;quot;).
In order of when they are available to the attacker, they
are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The DNS resolution of the server&lt;/li&gt;
&lt;li&gt;The IP address of the server&lt;/li&gt;
&lt;li&gt;The TLS &lt;em&gt;server name indication (SNI)&lt;/em&gt; field.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Traffic analysis&lt;/em&gt; from the pattern of data (message sizes, timing, etc.)
sent and received&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Taking these in turn...&lt;/p&gt;
&lt;h4 id=&quot;dns-resolution&quot;&gt;DNS Resolution &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#dns-resolution&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Typically the URL that the client starts with has a domain
name in it, such as &lt;code&gt;https://www.example.com/&lt;/code&gt;.
Before the client can connect to the server it needs to
know the server&#39;s IP address (the numeric address of the
server). The client uses the
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/&quot;&gt;Domain Name Service (DNS)&lt;/a&gt;
to &lt;em&gt;resolve&lt;/em&gt; the name into an IP address.
Historically, the local network has provided the DNS
server that the client uses to resolve the name.
The result is that the local network learns the name
of every server you are going to, with obviously negative
implications on privacy. Note that it &lt;em&gt;does not&lt;/em&gt; learn
which pages on the site you are visiting, just the site
names themselves.&lt;/p&gt;
&lt;p&gt;In the United States and some other countries, Firefox
has deployed a feature called
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/&quot;&gt;DNS over HTTPS Trusted Recursive Resolver (DoH TRR)&lt;/a&gt;,
which encrypts the DNS traffic and sends it to a
separate server with defined privacy policies;
this prevents the local network from learning the
sites you are going to via your DNS queries.
On other browsers, however, you generally are leaking
your DNS traffic to the network.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;ipv4-and-ipv6&quot;&gt;IPv4 and IPv6 &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#ipv4-and-ipv6&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The original version of IP, IPv4, had 32-bit
addresses, for a maximum of about 4 billion
total addresses. For obvious reasons, this
isn&#39;t enough for every device on the Internet.
In 1995, the IETF standardized IPv6, which has
128-bit addresses. However, IPv6 deployment
has been, extremely slow. For example, over 25 years later,
&lt;a href=&quot;https://www.google.com/intl/en/ipv6/statistics.html&quot;&gt;less than half&lt;/a&gt;
of Google usage is over IPv6. In the meantime,
people have developed a number of mechanisms
for sharing IPv4 addresses, including
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_address_translation&amp;amp;oldid=1111516804&quot;&gt;NAT&lt;/a&gt;
on the client side and
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Virtual_hosting&amp;amp;oldid=1097578808&quot;&gt;virtual hosting&lt;/a&gt;
on the server side. While these may not be
the cleanest designs from an architectural perspective, they actually
act to improve privacy by grouping together
traffic that would otherwise be separable by IP.&lt;/p&gt;
&lt;/div&gt;
&lt;h4 id=&quot;ip-address&quot;&gt;IP Address &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#ip-address&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The second major mechanism by which your browsing history
leaks to the local network is via the server&#39;s IP address.
This is a signal of variable quality. Big sites like
Amazon or Google run their own servers and so they
also have distinct IP addresses: in these cases it&#39;s easy
to tell which site you are visiting, just by looking to see
who operates the IP address in question.&lt;/p&gt;
&lt;p&gt;Smaller sites, however, often operate on shared infrastructure,
whether via shared hosting, or behind &lt;em&gt;content distribution networks (CDNs)&lt;/em&gt;, with more
than one site on a single IP address. In this case, the
IP address only allows you to narrow down the site to
the set of all sites on the same IP address, which
can be quite a large number of sites, especially with a
big CDN.&lt;/p&gt;
&lt;h4 id=&quot;server-name-indication&quot;&gt;Server Name Indication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#server-name-indication&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;This kind of shared hosting is convenient operationally but
presents a problem for TLS. When a TLS client connects to
a server, the server needs to provide a &lt;em&gt;certificate&lt;/em&gt;
proving that it owns the site (the domain name) that the client
is trying to connect to. If there is just one site on a single
IP, then the server can provide the corresponding certificate,
but if there are many such sites, then the server needs
to know which certificate to present.&lt;/p&gt;
&lt;p&gt;When TLS was originally deployed (back when it was called &amp;quot;SSL&amp;quot;), this
was a real problem and each server needed its own IP address; this was
eventually addressed by adding a TLS extension called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Server_Name_Indication&amp;amp;oldid=1110253101&quot;&gt;Server Name Indication (SNI)&lt;/a&gt;, in which the client provides the name of the server
it is trying to connect to. The SNI is not encrypted and so
a network observer can just read it off the wire and learn which
site the client is trying to connect to.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
As with DNS or the IP address, this just leaks the server&#39;s
name, not the pages on the site you are going to.&lt;/p&gt;
&lt;p&gt;The TLS community has of course known more or less since the beginning
that SNI was a privacy problem. In versions of TLS prior to
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8446&quot;&gt;TLS 1.3&lt;/a&gt;
the handshake—including the server&#39;s certificate—was
largely unencrypted, so this didn&#39;t seem like as big a deal,
because the certificate also leaked this information,
but TLS 1.3 encrypts most of the handshake, and so SNI became
the last major privacy leak in TLS proper. In the beginning of the TLS 1.3 design
process, a number of attempts were made to design a solution
for encrypting the SNI, but it turned out to be a really hard
problem and ultimately it didn&#39;t make it into the final specification.
However, the TLS working group is now working on a specification
for &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-tls-esni-14.html&quot;&gt;Encrypted Client Hello (ECH)&lt;/a&gt;,
which will protect the SNI under some circumstances. ECH is not
yet widely deployed, but hopefully we&#39;ll start to see more deployment
relatively soon.&lt;/p&gt;
&lt;h4 id=&quot;traffic-analysis&quot;&gt;Traffic Analysis &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#traffic-analysis&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The final privacy leak is via &lt;em&gt;traffic analysis&lt;/em&gt;, which is the
generic term for measuring the traffic patterns of the connection,
such as the size of the messages being sent, their timing, etc.
This turns out to reveal quite a bit about the sites people
are going to. &lt;a href=&quot;https://www.ietf.org/archive/id/draft-irtf-pearg-website-fingerprinting-01.html&quot;&gt;Goldberg, Wang, and Wood&lt;/a&gt; provide a good overview of the
research in this area. There has been some work on adding
countermeasures to TLS or HTTP to prevent this kind of
traffic analysis, but the problem isn&#39;t that well understood
and so far at least, there aren&#39;t any agreed upon defenses.&lt;/p&gt;
&lt;p&gt;The good news is that traffic analysis is a lot harder than
it looks—though Cisco
actually sells a &lt;a href=&quot;https://www.cisco.com/c/en/us/solutions/enterprise-networks/enterprise-network-security/eta.html&quot;&gt;product&lt;/a&gt; that does some of this.
If we were to close the other routes, it would
be a pretty substantial privacy improvement.&lt;/p&gt;
&lt;h3 id=&quot;privacy-implications&quot;&gt;Privacy Implications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#privacy-implications&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The upshot of all this is that whoever operates the local
network gets to learn quite a bit about the behavior of
people on the network. This is true whether they are a
public WiFi network or your internet service or mobile
provider. &lt;em&gt;[Clarified — 2022-09-24]&lt;/em&gt;.
Specifically, they get to learn:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The identities of the Web sites you visit (just by
looking at the connections).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Many of the apps on your device, because they &amp;quot;phone
home&amp;quot; to some server.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&#39;s a lot and is quite likely to include
information that many people would consider sensitive.
The example I usually give is that you might be visiting
some medical site, but there is plenty of other sensitive
behavior that people engage in that they don&#39;t want
others to know about, such as visiting dating
sites or watching porn.&lt;/p&gt;
&lt;p&gt;The actual privacy impact of this depends a lot on the nature
of the network, however. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Are you identifiable?&lt;/li&gt;
&lt;li&gt;Are the network operator or the people on the network
actually bothering to record your behavior?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The answer to the first of these questions is something you
can mostly figure out for yourself: how many other people
are on the network? Did you have to log in? Was there a
shared password? For example, if you are in an airport
with shared WiFi and either no password or a simple
captive portal where you don&#39;t identify yourself, then
it&#39;s going to be fairly hard to attribute your behavior to you
(though the network can generally create a profile corresponding
to all the sites you visit).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Note that
the apps on your phone may provide a somewhat
unique fingerprint that could at least in theory
be shared between operators, though I don&#39;t know if
this really happens.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;encryption-and-wireless-networks&quot;&gt;Encryption and Wireless Networks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#encryption-and-wireless-networks&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s very common for wireless networks to be encrypted,
but this provides surprisingly weak security. The
basic problem is that the encryption only prevents
people who are not on the network from seeing
the traffic. For consumer and public access
points &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Wi-Fi_Protected_Access&amp;amp;oldid=1111112032&quot;&gt;WiFi Protected Access (WPA-2)&lt;/a&gt; usually is operated in a
&lt;em&gt;pre-shared key&lt;/em&gt; mode where the encryption keys
for the network are derived from the password via
a handshake performed when each device joins.
This means that anyone who (1) has the password
and (2) is able to observe you joining is able
to see all of your traffic. Moreover, because the
passwords are usually quite weak, it is often possible
to just brute force them. There has been work
on a public-key based system (labeled
&amp;quot;forward secrecy&amp;quot;) that would prevent these forms
of attack, but it is not widely deployed and
appears to have &lt;a href=&quot;https://papers.mathyvanhoef.com/dragonblood.pdf&quot;&gt;other flaws&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;On the other hand, if you had to provide your identity
(pro tip: a lot of captive portals don&#39;t check the
e-mail address you provide them), then the network operator
can link the history of your sites to you. So this means
that situations where you have a user-specific password
or need to actually log in have a much worse privacy
situation. Note that in an environment like a hotel
where the operator knows where you are, then this
is probably enough to identify your traffic even
if there is no password or a shared password.&lt;/p&gt;
&lt;p&gt;The actual impact of course depends on whether the network
operator or other people on the network are actually
spying on you. Of course, there&#39;s no real way to tell
whether they are or not; even if the network has a privacy
policy which says that they don&#39;t monitor your behavior
you can&#39;t really tell if they are doing so or not. Moreover,
on wireless networks it&#39;s generally the case that other
users of the same network can observe your behavior—though
they probably won&#39;t know the information you used to log in—so
even if the network operator has a good privacy policy
you still have to worry about other people.&lt;/p&gt;
&lt;p&gt;This brings us to the topic of VPNs: if you use a VPN
then this will (mostly) prevent local attackers from
seeing the sites you are connecting to, which is good,
but it&#39;s a tradeoff because it &lt;em&gt;also&lt;/em&gt; provides a neatly labeled traffic set to
the VPN operator of which sites you are going to, together
with your identity (because you logged in to the VPN),
so you&#39;re really trusting the VPN operator to protect
your privacy. On balance, if you use a reputable VPN
service see &lt;a href=&quot;https://www.consumerreports.org/vpn-services/mullvad-ivpn-mozilla-vpn-top-consumer-reports-vpn-testing-a9588707317/&quot;&gt;Consumer Reports&#39; VPN report&lt;/a&gt;,
then this likely provides better privacy than just using
an untrusted local network, but it&#39;s important to remember
that ultimately there are only policy and not technical
controls on what the VPN operator can do.
Note that a multi-hop system like Tor or iCloud private relay doesn&#39;t have this
property, because there is no single entity who can de-anonymize both
you and your traffic.&lt;/p&gt;
&lt;h2 id=&quot;closing-thoughts&quot;&gt;Closing Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/public-wifi/#closing-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;People who work in communications security like to talk about
the Internet threat model in which the network is maximally
malicious. This is often phrased as &amp;quot;you give the packets to
the attacker to deliver&amp;quot;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
The idea is that protocols need to be designed to be
secure even in this very difficult setting. If you succeed, then it doesn&#39;t
matter what network conditions you&#39;re running in, and
questions like &amp;quot;is public WiFi safe&amp;quot; would be irrelevant.
Unfortunately, while there has been a lot of progress in
designing and deploying security protocols such as
TLS—to a lesser extent in building secure software—the
privacy properties of these protocols leave a lot to
be desired. The result is that it actually &lt;em&gt;is&lt;/em&gt; important
to ask whether you can trust the network to handle your
data in the way you would like.
The idea behind privacy enhancing technologies
like DoH, ECH, and proxying/VPNs is that they replace
this trust with technical mechanisms that prevent attack
even if the network is malicious, but we&#39;re not there
yet, and in the meantime, you still need to ask how
much you trust the network with knowledge of your activity.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is actually a lot less likely than you think
because consumer networking gear is famously insecure,
so it&#39;s reasonably likely that your average home network
has actually been compromised. &lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s actually even slightly hard to define what you
mean. For instance, should measure page loads or
HTTP transactions, or... &lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
SNI is not a perfect signal from the attacker&#39;s perspective
because HTTP allows the client to &lt;em&gt;coalesce&lt;/em&gt; traffic to
multiple servers on the same connection as long as they share
a certificate. For instance, if the server has a certificate
for &lt;code&gt;mail.example.com&lt;/code&gt; and &lt;code&gt;calendar.example.com&lt;/code&gt; and the
client connects to &lt;code&gt;mail.example.com&lt;/code&gt;, it can then send
traffic destined for &lt;code&gt;calendar.example.com&lt;/code&gt; without creating
a new TLS connection. This makes the problem of learning
which site the client is connecting to slightly harder, but
as a practical matter, there are plenty of non-coalesced
connections and even when they are coalesced they may
be associated with the same server operator, so SNI is a
pretty good signal. &lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is &lt;a href=&quot;https://www.usenix.org/system/files/soups2020-bird.pdf&quot;&gt;research&lt;/a&gt;
by Bird, Segall, and Lopatka indicating that
browsing history can be used for reidentification,
so this is not a perfect case of hiding in the crowd
but it would require a fair amount of work to
identify you. &lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I&#39;ve heard &lt;a href=&quot;https://www.cs.columbia.edu/~smb/&quot;&gt;Steve Bellovin&lt;/a&gt;
say this, but I think he may have been quoting. &lt;a href=&quot;https://educatedguesswork.org/posts/public-wifi/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>ELI15: PCR and PCR Testing</title>
		<link href="https://educatedguesswork.org/posts/pcr/"/>
		<updated>2022-09-14T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/pcr/</id>
		<content type="html">&lt;p&gt;As pretty much everyone is now aware, there are two main kinds
of COVID test:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;At-home based antigen tests (often called &amp;quot;lateral flow&amp;quot;)&lt;/li&gt;
&lt;li&gt;Lab-based molecular tests (often called &amp;quot;PCR&amp;quot; [&lt;em&gt;though not all molecular tests are PCR—2022-09-14&lt;/em&gt;])&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Lateral flow and PCR are both descriptions of the technology
used in the test, but unless you already know what they are,
they&#39;re just tech jargon. The purpose of this post is to
explain how PCR works at the &amp;quot;explain it like I&#39;m fifteen&amp;quot;
level. To that end, I&#39;ll be omitting most of the chemistry
and focusing on the main clever ideas, with some external cites
for those who want to learn more.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-dna&quot;&gt;Background: DNA &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pcr/#background%3A-dna&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As you no doubt know &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=DNA&amp;amp;oldid=1097142623&quot;&gt;Deoxyribonucleic acid
(DNA)&lt;/a&gt;—this
is the last time you will need to read &amp;quot;deoxyribo...&amp;quot;—carries
the genetic code that directs the development of humans and most (but
not all, as we&#39;ll see) living things. The basic structure of DNA is
of a sequence of small molecular subunits (&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Nucleotide&amp;amp;oldid=1102949293&quot;&gt;nucleotides&lt;/a&gt;).
Nucleotides generally have the same basic structure, which consists of
a common &amp;quot;backbone&amp;quot; consisting of a sugar, a phosphate group,
plus another chemical group called a nucleobase:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nucleotide.png&quot; alt=&quot;A nucleotide&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Modified version of a diagram from &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:DAMP_chemical_structure.svg&quot;&gt;Wikipedia&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;Each type of nucleotide has a different nucleobase.&lt;/p&gt;
&lt;p&gt;You can build a chain of nucleotides by attaching the sugar group
of nucleotide 1 to the phosphate group of nucleotide 2 and then the
sugar group of nucleotide 2 to the phosphate group of nucleotide 3, and so
on. This is true no matter what nucleobases are attached to the backbone.
There are four main
bases, adenine, cytosine, guanine, and thymine, which gives us a
4-ary code (unlike the binary code used by computers). It&#39;s canonical
to refer to these by their leading letters: A, C, G, T.&lt;/p&gt;
&lt;p&gt;Instead of being a single chain, normally DNA exists as a pair of
chains (with each chain often being called a &lt;em&gt;strand&lt;/em&gt;).
The key thing to know is that not any two strands can hook up.
Instead, the base pairs are &lt;em&gt;complementary&lt;/em&gt;:
with adenine pairing with thymine and guanine pairing with cytosine,
like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/DNA_chemical_structure.svg&quot; alt=&quot;DNA pairs&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Image from Wikipedia by Madeleine Price Ball.]&lt;/p&gt;
&lt;p&gt;Thus, the sequence of bases on the first strand determines the
sequence of bases on its paired strand. This means that most of what
you need to know about a given DNA molecule is encoded in the sequence
of bases. It&#39;s this sequence which determines the DNA code for an
organism.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;If you take two pieces of DNA with complementary sequences, they
will tend to hybridize with each other to form a pair of strands.
This will also work—though not quite as well—if the sequences
are close but not identical, which can be used to measure &amp;quot;closeness&amp;quot;
of two sequences; this was a more useful technique before sequencing
became fast and cheap.&lt;/p&gt;
&lt;p&gt;You don&#39;t need to know this to understand PCR, but the actual DNA molecule is arranged in a
characteristic &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Nucleic_acid_double_helix&amp;amp;oldid=1103773180&quot;&gt;double helix&lt;/a&gt; structure, which
looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/DNA_orbit_animated_static_thumb.png&quot; alt=&quot;Double helix&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Image from Zephyris via Wikipedia]&lt;/p&gt;
&lt;h3 id=&quot;dna-replication&quot;&gt;DNA Replication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pcr/#dna-replication&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The way that DNA replicates—for instance when a cell divides
into two, requiring two copies of the DNA, one for each cell— is that the paired strands unzip to form
two single strands (this is called &amp;quot;melting&amp;quot;).&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/Replication-Melting.drawio.png&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Replication-Melting.drawio.png&quot; alt=&quot;DNA Melting&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Note that in this diagram I&#39;m only showing short strands
with a small number of bases. The dashed regions are
intended to indicate that the DNA strand just goes on
indefinitely, but I&#39;m not going to show it.&lt;/p&gt;
&lt;p&gt;Next, an enzyme called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=DNA_polymerase&amp;amp;oldid=1100546419&quot;&gt;DNA polymerase&lt;/a&gt; builds another paired strand
onto each strand using the complementary bases in the
ambient cellular environment. The result is
now two DNA double helices, each of which is (hopefully)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
an exact copy of the first one, and each of which contains
one of the original strands and one newly created strand,
which used the original strand as the template.&lt;/p&gt;
&lt;p&gt;The diagram below shows the process of replicating the unzipped DNA.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/DNAReplication-Stage1.drawio.png&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/DNAReplication-Stage1.drawio.png&quot; alt=&quot;DNA Replication Stage 1&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are several more important things to notice here.
First, the two strands are built in opposite directions.
I&#39;ve colored things red and blue to help keep track, with
the blue chain polymerizing left to right (built off
the red template which goes the other way)
and the red right to left (built off the blue
template, which goes the other way). Note that in reality both strands have
the same chemistry, it&#39;s just that the backbones are facing
opposite directions, and of course the bases are complementary.&lt;/p&gt;
&lt;p&gt;Second, the process gets kicked off by having a &lt;em&gt;primer&lt;/em&gt;,
which is a short piece of DNA that is (of course) complementary
to the strand being replicated. Because polymerization only happens in one direction, however,
the primer ends up being one end of the replicated DNA
strand, with everything on the other side of the primer just
not being replicated. You can see this in the diagram
above, there the replicated blue strand has nothing
on the left of the CTGT (losing the T that was there in
the original, as well as everything else to the left which I didn&#39;t show) and the replicated red strand has nothing on the right of
the TATA, losing the C (as well as everything else to the
right). Note that the top red and bottom blue strands are
actually the originals, and so extend in both directions.&lt;/p&gt;
&lt;p&gt;In normal DNA replication in
the body, you&#39;d want to replicate the entire strand and so the
primer would be attached to the end of the chain (there&#39;s
some special biochemistry for this that we don&#39;t need to
go into), but in PCR, the primers are just little snippets
of single-stranded DNA that get attached to the DNA in the complementary
direction. PCR takes advantage of this in order to
focus on a particular portion of the DNA sequence.&lt;/p&gt;
&lt;h2 id=&quot;pcr&quot;&gt;PCR &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pcr/#pcr&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Suppose you find yourself in a situation where you want to examine a
relatively small amount of DNA. This comes up fairly frequently, for
instance in cases where you have an environmental sample or when you
are looking for something—like the COVID virus—in a larger
sample. In these cases, it&#39;s useful to &lt;em&gt;amplify&lt;/em&gt; the DNA of interest
so you have a larger amount for analysis. This is where the
&lt;em&gt;Polymerase Chain Reaction (PCR)&lt;/em&gt; comes in. PCR takes advantage of the
same biological DNA replication mechanism I described above to amplify
(make a lot of copies of) a DNA sequence of interest.&lt;/p&gt;
&lt;p&gt;The basic idea is simple if you know the sequence of interest.
(as with COVID, where we have the full sequence). You just synthesize primers that match
both ends of the DNA sequence you want to replicate, with one
matching the end in one direction and one matching the end in
the other. When you mix them up with single-stranded DNA,
the primers naturally hybridize (attach themselves) to
the right places on the DNA strands. You then run the replication process with these
primers.&lt;/p&gt;
&lt;p&gt;The first time you run the replication process, things are just
as shown above: the strands separate and then the polymerase
builds a &lt;em&gt;partial&lt;/em&gt; replica of each original strand (on top
of the original complementary strand) starting
with the primer. At the end of this process you now have two
paired DNA strands, as shown in the top part of the diagram below (which is
just the same as the previous diagram.)&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/DNAReplication-Stage2.drawio.png&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/DNAReplication-Stage2.drawio.png&quot; alt=&quot;DNA Replication Stage 2&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;However, if you run the process &lt;em&gt;again&lt;/em&gt; something interesting
happens. As expected, each of the pairs unzips, leaving you with two
original strands and then two partial replicas. The original
strands replicate just as before and produce the same replicas
as in the original process. However, when you build the
complementary strand using the replicas from the first phase
as the template they are built in the &lt;em&gt;opposite&lt;/em&gt; direction
from how that template strand was built. The result
is that they start at the primer and stop when the strand
ends, but because the strand already ended where the
other primer was, the result is you get a strand that
just consists of the region between the primers (inclusive).
I&#39;ve circled these strands in green so you can see them.&lt;/p&gt;
&lt;p&gt;If you run the process over and over, what happens is that
the original strands continue to make copies of partial
strands and the partial (1st generation) and short (2nd and later
generation) strands just make short strands. Each time you
run the process you double the number of copies, so quite
quickly you end up with a large copies of just the region
of interest and a few copies of the rest of the DNA sample.&lt;/p&gt;
&lt;h3 id=&quot;partially-unknown-sequences&quot;&gt;Partially Unknown Sequences &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pcr/#partially-unknown-sequences&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Above, I assumed that you know the DNA sequence that you are
interested in. This is certainly helpful, but it&#39;s not required.
Actually, all you need to know is the sequence of the endpoints of
the sequence of interest so you can make the primers.
The replication process just depends on the primers binding
to the relevant sections of DNA.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Once that happens, the polymerization process will work
just fine with anything in between (in nature it obviously
needs to work with basically any sequence). This is useful
for a number of scenarios, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When you want to sequence a specific piece of DNA,
for instance to look for mutations or defects.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When you want to test for a DNA sequence that is
subject to a lot of mutation, so you don&#39;t know
exactly what&#39;s there (SARS-CoV-2, for instance,
mutates quite rapidly).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In both situations, as long as you can find some
surrounding regions that are highly conserved, you
can make primers and replicate the region of interest.&lt;/p&gt;
&lt;h3 id=&quot;pcr-in-practice%3A-taq-polymerase&quot;&gt;PCR in Practice: &lt;em&gt;Taq&lt;/em&gt; Polymerase &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pcr/#pcr-in-practice%3A-taq-polymerase&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Conceptually, then, PCR is simple: make the right primers,
dump them into your sample, and then repeatedly run the
replication cycle. But what does &amp;quot;run the replication
cycle&amp;quot; actually mean? We need to unzip (melt) the
DNA, then let polymerase make copies, and then repeat.
But if we just dump some DNA, polymerase, primers,
and bases into a tube, not much is going to happen
because the DNA is already paired up, so we need something
to kick off the process.&lt;/p&gt;
&lt;p&gt;If you heat up the DNA to about 90°C,
then it will melt, so you can heat it up, and then let it cool
down a bit and it will replicate.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Unfortunately most polymerase enzymes are inactivated by being heated
up, so if you just do this, you need to re-add polymerase
every cycle, which is obviously a pain. However, the
good news is that there is a polymerase enzyme
(&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Taq_polymerase&amp;amp;oldid=1109827257&quot;&gt;&lt;em&gt;Taq&lt;/em&gt; polymerase&lt;/a&gt;)
from a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Thermus_aquaticus&amp;amp;oldid=1110163681&quot;&gt;bacterium&lt;/a&gt;
which lives in hot springs
which can survive being heated to 90°C. This makes
the problem much easier. You just need to mix up your
DNA, primers, bases, and &lt;em&gt;Taq&lt;/em&gt; polymerase in a tube and
repeatedly heat it up and cool it down.
You can buy a special machine called
a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Thermal_cycler&amp;amp;oldid=1107082211&quot;&gt;thermal cycler&lt;/a&gt;
that will do this automatically, and this is now
standard lab equipment.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/G-Storm_thermal_cycler.jpg&quot; alt=&quot;Thermal Cycler&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[From Rror via &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:G-Storm_thermal_cycler.jpg&quot;&gt;Wikipedia&lt;/a&gt;]&lt;/p&gt;
&lt;h3 id=&quot;rna&quot;&gt;RNA &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pcr/#rna&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;OK, so this is all very useful, but what about if you
want to amplify &lt;em&gt;RNA&lt;/em&gt;? This is a particularly relevant
application right now because &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=SARS-CoV-2&amp;amp;oldid=1105620058&quot;&gt;SARS-CoV-2&lt;/a&gt; (the virus that causes COVID-19) is an RNA
virus (as is HIV). I&#39;m not going to burden you with
the details of RNA, except to say that (1) it&#39;s (usually) single-stranded
rather than double-stranded and (2) it uses one different
base&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt; but is otherwise more or less is isomorphic to DNA.
You can still PCR-amplify RNA
by using an enzyme called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Reverse_transcriptase&amp;amp;oldid=1106093039&quot;&gt;reverse transcriptase&lt;/a&gt; that transcribes RNA into
DNA, at which point PCR works as usual. This technique
is called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Reverse_transcription_polymerase_chain_reaction&amp;amp;id=1106208858&amp;amp;wpFormIdentifier=titleform&quot;&gt;Reverse Transcriptase PCR&lt;/a&gt;.
As far as I can tell, you basically can do RT-PCR by dumping
reverse transcriptase into your sample and running things
as usual.&lt;/p&gt;
&lt;h2 id=&quot;pcr-testing&quot;&gt;PCR Testing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pcr/#pcr-testing&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;OK, so I&#39;ve told you how to amplify DNA sequences, which might
be fine if you wanted to sequence them, but how do you use this
to make a COVID test. The basic idea here is that you take a
sample from the patient and look for COVID RNA in it
(the same idea applies to HIV testing). But there&#39;s a
missing step here because what I&#39;ve described so far
just replicates DNA, it doesn&#39;t measure it.
I guess
you could try to replicate things for a while and then
maybe filter out the replicated strands and weigh them
or something, but that would be a really tricky bit of
analytical chemistry.
Fortunately, there&#39;s a much cleverer approach,
called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Real-time_polymerase_chain_reaction&amp;amp;oldid=1101314290&quot;&gt;Real-Time PCR&lt;/a&gt;
or quantitative PCR (qPCR), which actually measures
the replication process.&lt;/p&gt;
&lt;p&gt;There are a number of ways to do this, but the basic trick
is to measure replication via fluorescence. One version of
this is to make a probe which is basically a DNA sequence
that matches some sequence in the region of interest
but &lt;em&gt;also&lt;/em&gt; has some special chemistry that makes it
fluoresce (glow) when it detaches from a strand of DNA.
You then add that probe to the rest of the PCR mixture,
where it adheres to the single strands much as the primers
do.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/Replication-Fluorescence.drawio.png&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Replication-Fluorescence.drawio.png&quot; alt=&quot;Fluourescence detection&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When the PCR reaction runs, the polymerization process
evicts the probe from the template strand, replacing it with
the newly built complementary strand, causing it to
fluoresce. You can then measure the light coming out as
the reaction runs: the more replication that&#39;s happening—and
hence the more DNA there is that matches the primers—the
more light is emitted. Of course, because the PCR test
inherently amplifies DNA sequences, if there is any significant
amount of the target sequence, you&#39;ll eventually get some
fluorescence, so what matters is the amount you see after
a given number of PCR cycles. You&#39;ll sometimes hear the term
&lt;em&gt;cycle threshold (Ct)&lt;/em&gt; in connection which COVID tests.
This is just the the number of cycles you had to run
before you detected the virus. The more cycles, the less
there was in the initial sample.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pcr/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you&#39;re coming to this fresh, it&#39;s really hard to appreciate
how revolutionary all this is, and how much it&#39;s come as the
result of decades of hard work by thousands of talented scientists.
Just what I&#39;ve described here reflects at least three Nobel
prizes (&lt;a href=&quot;https://www.nobelprize.org/prizes/medicine/1962/summary/&quot;&gt;Crick, Watson, and Wilkins&lt;/a&gt;
in 1962 for the discovery of the structure of DNA&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;,
&lt;a href=&quot;https://www.nobelprize.org/prizes/medicine/1968/&quot;&gt;Holey, Khorana, and Nirenberg&lt;/a&gt;
for protein synthesis,
&lt;a href=&quot;https://www.nobelprize.org/prizes/chemistry/1993/summary/&quot;&gt;Mullis&lt;/a&gt; for
PCR), plus countless other contributions that didn&#39;t win the
Nobel.&lt;/p&gt;
&lt;p&gt;The result is an incredibly powerful set of analytic
techniques—not just PCR, but fast sequencing, which I hope
to talk about later—that have turned what used to be the effectively
impossible problem of learning a given DNA sequence (the first
viral DNA sequence was only performed back in 1984!) into
what is today a routine task.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I&#39;m really oversimplifying here. For instance,
DNA can be &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=DNA_methylation&amp;amp;oldid=1095233323&quot;&gt;methylated&lt;/a&gt;
which affects how the DNA is interpreted without changing
the sequence. &lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In reality, of course, this process is messy and you
get errors. There are also mechanisms to try to
fix the errors, see &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Proofreading_(biology)&amp;amp;oldid=1073828586&quot;&gt;proofreading&lt;/a&gt;). &lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I think it&#39;s also possible to make multiple primers
if there are variants, but I&#39;m not a PCR expert. &lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Your body, of course, does not heat up to 90°C,
at least if you want to stay alive,
but there are enzymes which will unzip the DNA
at lower temperatures—2022-09-14. &lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
PCR was originally patented by Cetus and when I first
saw PCR, the urban legend was that you could sell
these machines without paying Cetus as long
as you were careful to call them &amp;quot;thermal cyclers&amp;quot;
rather than &amp;quot;PCR machines&amp;quot;, even though as far
as I know they were only used for PCR. &lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Uracil rather than thymine &lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See also &lt;a href=&quot;https://en.wikipedia.org/wiki/Rosalind_Franklin&quot;&gt;Rosalind Franklin&lt;/a&gt;
who did fundamental work here, but was widely
overlooked for years. &lt;a href=&quot;https://educatedguesswork.org/posts/pcr/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Ultra-Trail du Mont-Blanc (UTMB) Race Report</title>
		<link href="https://educatedguesswork.org/posts/utmb/"/>
		<updated>2022-09-05T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/utmb/</id>
		<content type="html">&lt;p&gt;Probably the two most prestigious events in trail ultrarunning are the
&lt;a href=&quot;https://www.wser.org/&quot;&gt;Western States Endurance Run (Western
States)&lt;/a&gt;, held in June in California, and the
&lt;a href=&quot;https://utmbmontblanc.com/en/page/20/utmb%3Csup%3E%C2%AE%3C-sup%3E.html&quot;&gt;Ultra-Trail du Mount-Blanc
(UTMB)&lt;/a&gt;,
held in August in Chamonix, France. Both are 100-mile events
(UTMB is actually 171 km/107 mi) and draw the top ultradistance
runners. Americans tend to know about Western States because
it&#39;s older, but UTMB is much larger and fancier, with a field
of over 2000 (Western is &amp;lt;400) and just much higher production
values.&lt;/p&gt;
&lt;p&gt;Unlike prestige events in other
sports (e.g., the Hawaii Ironman or the Boston Marathon), ultras tend
to &lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/&quot;&gt;rely on lotteries for
admission&lt;/a&gt;, so
ordinary runners can find themselves running on the same course with
the best in the world (who get in via other mechanisms). I was lucky enough to get into the UTMB lottery
this year and knew I had to give it a shot.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/utmb-map.png&quot; alt=&quot;UTMB map&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/utmb-profile.png&quot; alt=&quot;UTMB profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Map and profile from Runalyze]&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;utmb-races&quot;&gt;UTMB Races &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#utmb-races&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The naming here is incredibly confusing. First, there are actually
a number of races happening the same weekend as UTMB under the
UTMB umbrella, including:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Race&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Distance&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Height Meters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Ultra-Trail du Mont-Blanc (UTMB)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;171&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Courmayeur-Champex-Chamonix (CCC)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;100&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Sur les Traces des Ducs de Savoie (TDS)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;145&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;9,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Orsières-Champex-Chamonix (OCC)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;55&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Petite Trotte à Léon (PTL)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;300&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;25,000!&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Historically there have also been a number of ultra
races named &amp;quot;Ultra-Trail &lt;whatever&gt;&amp;quot;, such as
&lt;a href=&quot;https://www.ultratrailmtfuji.com/en/&quot;&gt;Ultra-Trail Mount Fuji (UTMF)&lt;/a&gt;
or &lt;a href=&quot;https://www.ultratrailaustralia.com.au/&quot;&gt;Ultra-Trail Australia (UTA)&lt;/a&gt;.
Some, but not all, of these are now owned by UTMB
in what&#39;s called the &lt;a href=&quot;https://utmb.world/&quot;&gt;UTMB World Series&lt;/a&gt;,
which also includes a number of races that don&#39;t have
the words &amp;quot;ultra trail&amp;quot; in the name, such as
&lt;a href=&quot;https://www.speedgoatmountainraces.com/&quot;&gt;Speedgoat&lt;/a&gt;. These
are of course branded under the UTMB name, producing
the confusing situation in which the races that
happen in Chamonix are collectively referred to
as &amp;quot;UTMB Mont-Blanc&amp;quot; and the 170 km flagship
race is referred to as &amp;quot;UTMB Mont-Blanc - UTMB&amp;quot;,
which is to say &amp;quot;Ultra-Trail du Mont-Blanc Mont-Blanc - Ultra-Trail du Mont-Blanc&amp;quot;.
That&#39;s the event that I ran and what most people mean
when they say &amp;quot;UTMB&amp;quot;.&lt;/whatever&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/utmb-ad.png&quot; alt=&quot;UTMB advertisement&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Finally, UTMB acquired Western States last year
and is re-branding UTMB Mont-Blanc as the series
&amp;quot;finals&amp;quot;, with Western States as a subordinate event,
though potentially one of the continental &amp;quot;Majors&amp;quot;.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;qualification%2Fentry&quot;&gt;Qualification/Entry &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#qualification%2Fentry&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Up to and including this year, UTMB had a two-phase qualifying
system. First, you had to
collect enough qualifying points by doing other events. The standard
this year was 10 points over 2 races, with a medium-hard hundred being
5 points and a hard hundred being 6.
The interesting thing about this structure is that you actually
don&#39;t need to be that good to get in: it&#39;s of course hard to run
100 miles, but in order to get the points you generally only
need to finish the event, which isn&#39;t that hard if you are going
to have any chance to finish UTMB, which is quite a bit harder
than your average 100.&lt;/p&gt;
&lt;p&gt;My qualification came from &lt;a href=&quot;https://sandiego100.com/&quot;&gt;San Diego 100,
2019&lt;/a&gt; and &lt;a href=&quot;https://roguevalleyrunners.com/pages/pine-to-palm&quot;&gt;Pine to Palm 100,
2019&lt;/a&gt;.  Ordinarily,
you would need to qualify within two years, but because of COVID UTMB
allowed people to continue their qualification through this year.  I
applied to UTMB back in 2019 and didn&#39;t get in, and they double your
chances each time you didn&#39;t get in, so I had something like a 20%
chance of admission (as an aside, I&#39;m not sure what it says about
runners that so many of us want to run 170km in the Alps that they
have to run a lottery to control entry.) To be honest, I hadn&#39;t
expected to get in and had the rest of my season planned out,
but when you get the chance to do UTMB, you do it—or
at least I do.&lt;/p&gt;
&lt;h2 id=&quot;race-overview&quot;&gt;Race Overview &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#race-overview&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;UTMB starts in &lt;a href=&quot;https://en.chamonix.com/&quot;&gt;Chamonix Mont-Blanc&lt;/a&gt; (Chamonix)
and does a big loop around Mont-Blanc, mostly following the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Tour_du_Mont_Blanc&amp;amp;oldid=1084133809&quot;&gt;Tour du Mont Blanc&lt;/a&gt; course. The total distance is
listed as 171.5 km (106.6 miles) and 10000 height meters (32800 ft)
(note that this is 10000 meters of climbing, so also 10000 meters
of descending). The general pattern is that you climb up to some mountain
pass (col), then descend back down to one of the towns in the area,
then climb back out and repeat.&lt;/p&gt;
&lt;p&gt;Although Mont-Blanc is of course
quite tall (4808 meters), UTMB isn&#39;t really at altitude: Chamonix
Mont-Blanc is at around 1000m (3400 ft), and you never go much above
2500m (8300 ft), which is enough to feel some effects of altitude
but nowhere near as bad as (say) &lt;a href=&quot;https://www.aravaiparunning.com/tushars/&quot;&gt;Tushar&#39;s Mountain&lt;/a&gt;,
which starts at over 3000m (10000 ft). And of course, you don&#39;t
usually stay at that altitude for long.
With this much vert, though, you&#39;re
basically always climbing or descending, and there&#39;s very little
flat running, with what there is mostly in the towns along the
way, and much of that on asphalt or flat dirt track.&lt;/p&gt;
&lt;p&gt;For those of you used to US ultramarathons, UTMB has a number
of big differences. First, because of the giant starting
field you&#39;re almost never alone unless you&#39;re way out front
or way off the back. I don&#39;t think I spent more than 5 minutes
without seeing anyone during the whole event. This also means
that the trails can get super congested, especially at the beginning,
where it&#39;s almost impossible to pass people.
Second, it&#39;s
not out in the middle of nowhere, but keeps going through
these small towns and refuges. Every time you run through
a town—at least during the day—there are people out in the streets lining the route
cheering and high fiving you. This is especially true at
the start/finish in Chamonix and the early towns like
Saint Gervais when people would still naturally be
awake.&lt;/p&gt;
&lt;p&gt;Next, it&#39;s a nighttime start. Most US hundreds start in the morning,
so if you&#39;re a non-elite you&#39;ll run through the night after
running all day. UTMB starts at 6 PM, and because it gets
dark around 8:30 you&#39;re running through the night. I think this
is done so that the elites will finish in the daytime: the
elite men finish around 20 hrs (2 PM) and the elite women
finish around 22-23 (4-5 PM). The consequence is that
the non-elites are going to run through two nights and
if you&#39;re reasonably fast you&#39;re going to spend more than half
the race in the dark.&lt;/p&gt;
&lt;p&gt;Next, there&#39;s crewing but no pacing. Most US hundreds will allow
you to have someone run the latter part of the race
with you; for instance, I&#39;m pacing my friend and training
partner &lt;a href=&quot;https://chris-wood.github.io/&quot;&gt;Chris&lt;/a&gt; at
&lt;a href=&quot;https://runazt.org/flagstaff-to-grand-canyon-stagecoach-line-100/&quot;&gt;Stagecoach 100&lt;/a&gt;
in a few weeks. UTMB doesn&#39;t allow pacing outside of short
zones near the aid stations—and somewhat informally in
the last 200 meters of the race or so—though, as I said
above, you&#39;re never really alone anyway. It does, however,
allow a single crew member, which is super-helpful, and Chris
came over to crew me.&lt;/p&gt;
&lt;p&gt;Finally, there is just an unbelievable amount of climbing, more
than all but a few US races such as Ouray or Hardrock, and
much of it is fairly technical by US standards, by which I mean
that there are a lot of big rocks and the like that you need
to navigate, and sections where it&#39;s not really runnable at all, as in
I would hike it even if I were running 10 miles rather than
100. By contrast, in most US ultras you could basically run
any section individually, even though end up hiking in order
to conserve energy. As I understand it, UTMB is actually considered pretty
non-technical by European standards, and, for instance,
the companion TDS race is rather more technical.
In any case, it&#39;s hard, and as discussed later, this threw me off
a bit.&lt;/p&gt;
&lt;p&gt;Based on my previous races &lt;a href=&quot;https://utmbmontblanc.com/en/page/497/LiveRun%20App.html&quot;&gt;Liverun&lt;/a&gt;
estimated my finish time as 34:29, so I had my pace targets based
on that and their projections (largely so my crew could meet me), but in the event I
was pretty far off.&lt;/p&gt;
&lt;h2 id=&quot;pre-race-logistics&quot;&gt;Pre-Race Logistics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#pre-race-logistics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With the race start on Friday, I arranged to fly out Sunday, arriving
on Monday. I took the overnight flight from San Francisco—using miles
to buy business class so I could sleep—through London Heathrow,
and then on to Geneva. From there, you can get a car to Chamonix,
which takes about 60-90 minutes. This all got me to Chamonix around
11:30 PM, which wasn&#39;t too bad. I opted to get a private car
(via &lt;a href=&quot;https://www.mountaindropoffs.com/&quot;&gt;Mountain Dropoffs&lt;/a&gt;) on
this leg because this meant I didn&#39;t have to wait for other people
and I was ensured of being able to find my driver as soon as I was
ready. This all went reasonably smoothly, though wearing an N95 mask
for the whole trip was fairly unpleasant, as I didn&#39;t want to get
COVID right before my race.&lt;/p&gt;
&lt;p&gt;I&#39;d arranged to stay at the &lt;a href=&quot;https://www.pointeisabelle.com/en&quot;&gt;Pointe Isabelle&lt;/a&gt;
in the center of Chamonix maybe 200m from the race start. This was
very convenient because it means you can just hike over the expo
or the race start, as well as being within easy walking distance
of the store for every major sporting good brand (Salomon, Arc&#39;Teryx,
Patagonia, etc.). This was actually a pretty nice hotel and
I&#39;d stay there again. I got a &amp;quot;4 person&amp;quot; room which had a double
bed and a bunk bed in separate rooms, which was good for
when Chris arrived.&lt;/p&gt;
&lt;p&gt;Chris arrived Thursday morning, so I had a couple days to myself
and mostly just didn&#39;t do anything. I had a few easy morning
runs which gave me an opportunity to check out the last few
miles of the course (not easy!) and other than that I mostly
just stayed in my room and read or tried to sleep. I was jet
lagged of course, but as I didn&#39;t really plan to time adapt,
I didn&#39;t think it was worth keeping the kind of rigid schedule
that usually helps adaptation, and so I slept a bit fitfully
and took a lot of naps.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;a-note-on-poles-and-loops&quot;&gt;A note on poles and loops &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#a-note-on-poles-and-loops&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Standard hiking poles have both a hand-grip and loops you
put your hands through. You&#39;re supposed to not really grip
the grips too hard and instead use the loops for leverage,
which stops your hands from getting tired. The problem
is that when you&#39;re running—especially downhill—you
don&#39;t want your hands to be in the straps: either you
hold the poles by the middle or you hold the grips but
you want to be able to let go if you crash. This means you
need to put your hands through the straps and also because
the straps are asymmetrical, if you are holding the poles
together, when you want to use them normally you need to
figure out which is which. LEKI has a different engagement
system in which you wear a strap on your hand permanently
and there&#39;s a little piece of cord set in the loop that clips
into an engagement with the pole that you can get into
and out of with a button. This means you can get
in and out quickly and also that the poles themselves
are symmetrical, so going from carrying them to using
them is faster.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/black-diamond-straps.png&quot; alt=&quot;Black Diamond Handles&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Black Diamond Handles [from the BD site]&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/leki-straps.png&quot; alt=&quot;Leki Handles&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Leki Handles [from the LEKI site]&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As Thursday rolled around, I started to get worried about
whether I had everything I needed and ended up scrambling to
get a few more things. In particular, UTMB requires you to
have a long sleeve shirt and a rain jacket, but I decided
I needed another layer, and ended up buying a &lt;a href=&quot;https://www.patagonia.com/product/mens-houdini-windbreaker-jacket/24142.html?dwvar_24142_color=WAVB&amp;amp;cgid=mens-jackets-vests-lightweight&quot;&gt;Patagonia Houdini&lt;/a&gt;
(last year&#39;s model, on sale for € 70). This turned out
to be a great choice because it was comfortable when things
were a little chilly but when my long sleeve layer
(Patagonia Capilene) would have been too hot. This last-minute
panic buying was actually on top of some pre-trip panic
buying when I replaced my rain jacket (going to the Inov-8
&lt;a href=&quot;https://www.inov-8.com/us/raceshell-half-zip-featherlight-waterproof-running-jacket?colours=674&quot;&gt;Raceshell HZ&lt;/a&gt;
and hiking poles (going to the LEKI &lt;a href=&quot;https://www.runningwarehouse.com/LEKI_Ultratrail_FXOne_Superlite_Carbon_Poles/descpage-LEKUTSL.html&quot;&gt;Ultratrial FX.One Superlight&lt;/a&gt;,
headlamp (&lt;a href=&quot;https://www.lupinenorthamerica.com/Neo_X2_Headlamp.asp&quot;&gt;Lupine Neo&lt;/a&gt;)
and pole storage (&lt;a href=&quot;https://www.salomon.com/en-us/shop/product/pulse-belt.html#color=49375&quot;&gt;Salomon Pulse Belt&lt;/a&gt;)
).
By Thursday night I had pretty much everything, so I
laid out all of my stuff, including packing my pack and
the stuff Chris would need for crewing. All that was left
was to pick up my packet, actually put on my race stuff,
drop my drop bag, and head over to the start. Once this
was done, Chris and I headed over to Annapurna II
for an early dinner at 6 and then early bedtime.&lt;/p&gt;
&lt;p&gt;Unfortunately, I ended up not sleeping well, only getting
about 5 hrs. Given the 6:00 start, I figured I&#39;d just
spend as much as possible of Friday sleeping, so Chris and
I went and got breakfast and then Chris headed out
for his run and I went back to bed.
The way that race check-in works at UTMB is that you
actually have a reserved window to pick up your packet.
Mine was at 1200-1400 on Friday, so I only had to
be up for that and then I could go back to sleep.
Chris agreed to drop off my drop bag (only after 1400!),
though it probably wasn&#39;t worth bothering with, as
it only shows up at the 80 km mark in Courmayeur and
Chris was able to meet me there, so it was just if he
got hung up and didn&#39;t make it for some reason.&lt;/p&gt;
&lt;p&gt;Race start is 1800 but they ask you to show up at
1730. This is where being close really paid off, as we were able to
just walk over quickly around
1715. Even so, the start line was just totally packed
(2000+ people, remember). And you&#39;re just packed in there
with everyone else. From where I was standing I could just barely see
the start line and the monitors. The next 30 minutes were
a bunch of announcements and videos of the pros.
It started to rain sometime in here, so I ended up
putting my jacket on, though it came off not too
soon after. I was wearing a KN95 mask for this time:
being near so many people felt risk even outside.&lt;/p&gt;
&lt;h2 id=&quot;the-race&quot;&gt;The Race &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#the-race&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finally, the gun. Well, the start at least. Of course, at this
point you&#39;re still like 100m away from the actual start and
packed in super close with everyone else, so everyone&#39;s trying
to run but really you&#39;re just walking almost the whole time,
so you run a few steps and then have to walk again.&lt;/p&gt;
&lt;p&gt;Once you get past the line, you&#39;re running through the streets
of Chamonix which, are absolutely packed with people cheering
you on, high fiving, etc. This goes on for a few kilometers
until you open up onto some rolling fire roads. At this
point, there are still a huge number of people on the trail
with you, so if you&#39;ve managed to get yourself at the wrong
part of the field you&#39;re either stuck behind people or people
are pushing around you. I just tried to chill out and not worry
too much about position. Apparently it can get super dusty
in this area in which case you want to be in front, but with
the light rain this mostly wasn&#39;t an issue.&lt;/p&gt;
&lt;h3 id=&quot;start-to-les-contamines-%5B31.2-km%2C-1581%2B%2F1347-%2C-4%3A08%3A44%2C--%3A08%5D&quot;&gt;Start to Les Contamines [31.2 km, 1581+/1347-, 4:08:44, -:08] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#start-to-les-contamines-%5B31.2-km%2C-1581%2B%2F1347-%2C-4%3A08%3A44%2C--%3A08%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first leg to Les Contamines Montjoie is pretty straightforward.
Initially it&#39;s pretty smooth road and fire road that&#39;s gradually
downhill. As I said above, it&#39;s pretty hard to go fast for the
first 5 KM or so, but eventually it opens up enough that you
can kind of find your position. By this point it had stopped
raining, so I just had my jacket back in my pack.&lt;/p&gt;
&lt;p&gt;Things start to trend upwards after about 8K, heading through Les
Houches and Col de Voza. This is all pretty good even trail, so
you&#39;re just comfortably hiking and I spent a bunch of it chatting
with YouTuber &lt;a href=&quot;https://jeffpelletier.com/&quot;&gt;Jeff Pelletier&lt;/a&gt;.
Eventually you come over the pass and then it&#39;s down through
Saint-Gervais and on to Les Contamines. At Saint-Gervais I
got a slightly unpleasant surprise, which was that the race food
wasn&#39;t what I expected.&lt;/p&gt;
&lt;p&gt;Some background: you don&#39;t carry all your food for an ultra;
instead their are aid stations which have food and drinks,
which is usually a combination of &amp;quot;real food&amp;quot; like pretzels,
cookies, etc. and engineered foods like sports drinks,
energy bars, carbohydrate gels, etc. There are a few major
companies who manufacture this stuff, so an American ultra
will typically have Tailwind or Gu Roctane for the drink
and Gu Roctane, Spring Energy, or something similar for
the gels. I&#39;m pretty familiar with these, and I know I
can tolerate them well, but I knew that UTMB would be
serving something different, specifically &lt;a href=&quot;https://en.overstims.com/&quot;&gt;Overstim&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&#39;d ordered a variety of different Overstim products
and tried them out and seemed to tolerate them OK, but
I neglected to order the specific flavor of sports
drink (Mojito, which turned out not to be that bad),
and then the energy bar was something like a granola
bar, which honestly wasn&#39;t that good. Fortunately, I
brought a bunch of my own stuff—mostly
&lt;a href=&quot;https://myspringenergy.com/&quot;&gt;Spring Energy&lt;/a&gt; gels
and &lt;a href=&quot;https://tailwindnutrition.com/&quot;&gt;Tailwind&lt;/a&gt;
drink, so I wasn&#39;t entirely dependent on them, but
it would have been a lot more convenient to just graze
at the aid stations. They did, however, have mini Mars bars
(in Europe that basically means a Milky Way), and I could
foresee a lot of them in my future.&lt;/p&gt;
&lt;p&gt;The run into Saint-Gervais is pretty great: it&#39;s
late evening so everyone is out on the streets cheering you
on, plus you&#39;re only a few hours in so you&#39;re still feeling
good, which is not going to be the situation later.
At this point, though, it&#39;s like you&#39;re a pro.&lt;/p&gt;
&lt;p&gt;There&#39;s a relatively long gradual climb out to Les Contamines,
which is still pretty easy. Contamines is the first
aid station where you&#39;re allowed to have crew, so Chris
was there. The whole setup was a little confusing, but eventually
we met up and I switch my bottles, grabbed some more gels and
Tailwind powder, and headed back out.&lt;/p&gt;
&lt;h3 id=&quot;contamines-to-courmayeur-%5B49.6-km%2C-%2B3019%2F-3094%2C-10%3A35%3A31%2C-%2B%3A41%5D&quot;&gt;Contamines to Courmayeur [49.6 km, +3019/-3094, 10:35:31, +:41] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#contamines-to-courmayeur-%5B49.6-km%2C-%2B3019%2F-3094%2C-10%3A35%3A31%2C-%2B%3A41%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This is a super long stretch consisting of two big climbs, first up to
the Refuge de la Croix Du Bonhomme, and the to the Col de La Seigne
and then a smaller 500m climb to the Arête du Mont-Favre
before finally down into Courmayeur.&lt;/p&gt;
&lt;p&gt;These are some serious climbs. First, they&#39;re long and steep,
with 1200m of climbing to the Refuge followed by almost
1000m to the Col de la Seigne. Worse yet, they&#39;re rocky and
it&#39;s not just a matter of your ability to just put out raw
power, like in an Ironman or a marathon, because you&#39;re constantly
having to adjust your stride or step higher than you naturally
want to. Of course you&#39;re hiking all the uphills—and
even that is hard work—but then the downhills are rocky
and technical—and of course it&#39;s dark—so you&#39;re
not able to move that fast on them either, even if you have
a good headlamp.&lt;/p&gt;
&lt;p&gt;I actually had a couple of small slips on the downhills,
including one where my foot slipped off the shoulder of the trail
and I twisted my knee a bit. I was a bit worried that was
going to end my race right there, but actually I was able to
run it off, so that was OK.&lt;/p&gt;
&lt;p&gt;In addition, even though it was dark it still pretty humid
for a lot of this, which slows you down itself.
Bottom line, by the time I rolled into Courmayeur I was a
lot more tired than I wanted to be at this point in the race.
Usually you really want to take the first half of a hundred
quite easy because the second half is going to be hard no
matter what, but I&#39;d already had to work quite a bit more
than I had planned.&lt;/p&gt;
&lt;p&gt;Chris met me at Courmayeur and we did the usual bottle swap
and extra food thing. I also cleaned my feet, re-lubed them,
and changed my socks. Things weren&#39;t actually bad here, but
it&#39;s good not to take any chances. Chris had brought some
Tailwind Recovery drink (higher calories, more protein),
and I was able to drink that while I was doing this stuff.
Taste-wise, this was a nice change, but at this point
I was starting to feel the first hints of nausea and it
sat a little heavy in my stomach.
All of this took quite a bit longer than I was hoping
for, especially as I had to do some waiting around, so I
was out in 27 minutes, 1:08 behind. Ironically, I actually
seem to have spent less time in this aid station than others,
as I seem to have come in in 892nd and left in 772nd.&lt;/p&gt;
&lt;h3 id=&quot;courmayeur-to-champex-lac-%5B45.9-km%2C-%2B2720%2F-2558%2C-10%3A17%2C-%2B1%3A30%5D&quot;&gt;Courmayeur to Champex-Lac [45.9 km, +2720/-2558, 10:17, +1:30] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#courmayeur-to-champex-lac-%5B45.9-km%2C-%2B2720%2F-2558%2C-10%3A17%2C-%2B1%3A30%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This section starts with a very steep climb of 805m/4km
out of Courmayeur to Refuge Bertone. This climb wasn&#39;t
so bad—just the usual &amp;quot;are we there yet&amp;quot; stuff—
but right after I hit the aid station at the top I just
started to feel incredibly wiped out. It was starting
to get hot and I just sat in the shade for a while
and tried to pull myself together, with only modest
success.&lt;/p&gt;
&lt;p&gt;I spent a lot of the next few kilometers just hiking and trying to run
a bit. This is unfortunate because this section (through to Arnouvaz)
is some of the most runnable of the course, just rolling and smooth,
so I was losing a lot of time.  Eventually I just sat at the side of
the trail and tried to recover. At this point I realized I probably
need to start on caffeine—I had been hoping to wait until
night—so I took some caffeine and (I think) some salt. This
picked me up some and I was able to keep going.&lt;/p&gt;
&lt;p&gt;I don&#39;t really remember the climb out of Arnouvaz to Grand Col Ferret (745 D+)
and mostly remember just kind of suffering through it and then the
descent to La Fouly. By this point I was pretty nauseated:
I never really vomited but none of my food felt appetizing
and every time I started to run I would notice that
I felt worse. At this point I hooked up with another American
runner and we did about 5K together (Steve, IIRC), just taking it really
casual. We were both way off our pace targets (him 30ish and me
34ish) and our stomachs had turned, so we just tried to take
it easy. Steve mentioned that he&#39;d been at a talk the day
before about how it was worth trying to take a short nap
if you were going to be much over 30 hrs, so I decided to
try to do that at Champex.&lt;/p&gt;
&lt;p&gt;The La Fouly to Champex-Lac section is deceptively long: there&#39;s
a long slow downhill from La Fouly which is mostly on dirt and
then road, so theoretically runnable (here again, I wasn&#39;t
running as much and so losing time), followed by the climb to
Champex, where Chris was waiting. The climb isn&#39;t that technical
and I started to feel better once I got on it and was able
to push some without it jostling my stomach (also it was starting
to get later in the afternoon). It helps that the climb isn&#39;t
really that long and kind of shaded.&lt;/p&gt;
&lt;p&gt;Met up with Chris again, more Tailwind Recovery, and we refilled
everything. Sure enough, there was a tent with mattresses and I did try
for 15 min, but I wasn&#39;t able to sleep at all. Eventually, I gave up,
but did take advantage of the mostly quiet tent to change my shirt and
hat, as everything was all sweaty and I wanted it dry for the evening.&lt;/p&gt;
&lt;h3 id=&quot;champex-lac-to-trient-%5B16.2-km%2C-%2B914%2F-1088%2C-4%3A35%2C-%2B2%3A09%5D&quot;&gt;Champex-Lac to Trient [16.2 km, +914/-1088, 4:35, +2:09] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#champex-lac-to-trient-%5B16.2-km%2C-%2B914%2F-1088%2C-4%3A35%2C-%2B2%3A09%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The rest of the course is three big climbs and descents, with crew
at the end of each (well, the last one is the finish), so it was
just a matter of getting through each.&lt;/p&gt;
&lt;p&gt;The first of these is Champex-Lac to Trient. This was probably
objectively the hardest of the three, not so much because of the
vert (over 800M+) but because it&#39;s really rocky and steep, so
it&#39;s hard to find your pace because you&#39;re constantly having
high step, etc. The downhill doesn&#39;t get much better either
because it&#39;s just rocky and rooty, so I (at least) couldn&#39;t
go that fast and there was a lot of intermittent hike/running
when I should have been running.&lt;/p&gt;
&lt;p&gt;Arrived Trient in the dark and it&#39;s rinse repeat from here:
Tailwind recovery, new bottles, and go. Still nauseated here,
but it was kind of under control.&lt;/p&gt;
&lt;h3 id=&quot;trient-to-vallorcine-%5B8.9km%2C-%2B836%2F-875%2C-3%3A10%2C-%2B2%3A39%5D&quot;&gt;Trient to Vallorcine [8.9km, +836/-875, 3:10, +2:39] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#trient-to-vallorcine-%5B8.9km%2C-%2B836%2F-875%2C-3%3A10%2C-%2B2%3A39%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This was probably the best of the last three climbs: it was
the least technical and so you could mostly just motor up
it, and I felt pretty OK until partway up when I started
to have some serious bathroom issues. The trail was pretty wide
but was uphill face on one side and drop-off on the other
but I was finally able to find a section where I could
go downslope a bit, hang onto the hillside, and go. Thanks
to whoever gave me a hand getting back up after I was done.
Took a couple of immodium here, which seemed to help,
at least as far as Vallorcine.&lt;/p&gt;
&lt;p&gt;Not too much to report about this section. Just something I
had to get through to make the final climb. Was relieved when
I got to Vallorcine and met Chris for the final time.
Did the usual aid station thing and was also able to bum
some salt tablets off a fellow runner as I was running
out, as well as some napkins to use as toilet paper off of the med workers.&lt;/p&gt;
&lt;p&gt;At this point I decided to swap drinks: I&#39;d been drinking
Tailwind or Overstim the whole time but I&#39;d brought some
&lt;a href=&quot;https://www.maurten.com/&quot;&gt;Maurten&lt;/a&gt; powder and poured that
into my bottles for the last push. Maurten is a hydrogel
formula designed to reduce GI distress and also comes
in a 320cal/500ml formulation (Tailwind is 200), which
means that you don&#39;t really need to eat anything, which
I was looking forward to at this point. It&#39;s got a bit
of an off-putting slimy texture, which is part of why I didn&#39;t
want to use it the whole time, but at this point that
seemed pretty good.&lt;/p&gt;
&lt;h3 id=&quot;vallorcine-to-finish-%5B18.5km%2C-%2B972%2F-1200%2C-4%3A36%2C-%2B3%3A20%5D&quot;&gt;Vallorcine to Finish [18.5km, +972/-1200, 4:36, +3:20] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#vallorcine-to-finish-%5B18.5km%2C-%2B972%2F-1200%2C-4%3A36%2C-%2B3%3A20%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This last section was a real mixed bag. You start out on some
flat segments and then there&#39;s a longish gradual uphill. I
felt great on this: it was very smoothly terrain and just a few
percent grade so I pulled out the poles, put in the headphones,
and power hiked at a nice hard pace, with the result that
I was passing people left and right, in part because some
of them were clearly dying but also because I was moving
fast.
This continued into about partway through the main
climb, even as it turned into a series of rock steps.&lt;/p&gt;
&lt;p&gt;Then about 1/3 of the way up, my bathroom issues returned. I
was eventually able to get a little bit off the trail and go
but a lot of people passed me during this section and I somehow
never regained my momentum. In theory these were people who
were behind me, and so I should have re-passed them, but
in practice, it just wasn&#39;t that easy.&lt;/p&gt;
&lt;p&gt;This section felt unbelievably long, mostly because you&#39;re
just not moving at all fast due to the terrain. Once you
finally get near the top you have to pick your way through
a boulder field, which is really slow going, at least for
me. Eventually, I made it to La Tête aux Vents,
and from there it&#39;s mostly flattish to La Flégère,
albeit quite rocky. Here too, it was tough to run, though
some people with better footwork than I tore past me.
I actually fell once here and landed hard on my arm. After
that I put my poles away.
Here&#39;s &lt;a href=&quot;https://www.youtube.com/watch?v=mWlNS8EtUAs&quot;&gt;Jim Walmsley&lt;/a&gt;
on this section, not looking very fast.
There&#39;s a final tiny climb to La Flégère. Not worth taking
out my poles and I just did it hands on knees to the aid station.
It&#39;s all downhill from here and I had plenty of food and fluid
so I didn&#39;t bother to stop and just headed back down.&lt;/p&gt;
&lt;p&gt;At this point, my focus was
on a sub 38 finish (well short of my target, but oh well). It&#39;s supposedly
8K down to the finish and I left La Flégère at
36:48, so I needed to run 9:00/km to get there on time. This
doesn&#39;t sound very fast, but of course at this point you&#39;re tired.
The initial descent out of La Flégère is on fire road
and so I was able to hit it pretty fast. Even so, when my 37:00
timer fired, I was 1.2km out, and 10:00/mi wasn&#39;t going to do it.
We fairly quickly got off fire road and into some very switchbacked
single track, which slowed things down further.&lt;/p&gt;
&lt;p&gt;I&#39;d reconned the last 4K or so of the course, so I knew that it turned into
runnable fire road and then smooth trail and road about 3K out, so
I mostly just needed to survive the single track (without crashing!)
and get to the part where I could work. I figured I needed to
hit 37:30 with &amp;lt;4K to go in order to be on target, so I pushed as
much as I dared. The pace wasn&#39;t bad, and I passed a few people,
including some PTL finishers (the real heroes) but I definitely had a few
braver—or more agile—people tear by me. I hit
37:30 at 3.8K or so, which was behind schedule, but I knew that I
was getting close to the really runnable section so I just held on.
I had plenty of gas here, I just wasn&#39;t able to run faster safely.&lt;/p&gt;
&lt;p&gt;Finally I hit the fire road and was able to really open up some,
even though it was kind of steep and rocky. Then there&#39;s a final
switchbacky portion that I&#39;d run before—though the course
actually just goes straight downhill across a bunch of the switchbacks—and
then out onto the road, or rather onto this terrifyingly rickety and
slippery metal bridge that the race had erected over the road, and then
finally onto dirt trail.&lt;/p&gt;
&lt;p&gt;I looked at my watch and it was 37:40 and
I figured it was less than 2K of flat running to go (actually
more like a K, as it turned out) so things
were probably OK if I didn&#39;t dawdle.
As I said above, I had plenty left because I&#39;d been running
the equivalent of recovery pace for the last hour or so, so I
felt comfortable pouring on the gas, or whatever gas I had
left, and I ran the last half mile in 8:49 (which felt like
7:00), passing maybe 3 or 4 people in that last section.
Once I realized I was so close and actually might go sub
37:50, I started pushing even harder. This is helped by the
fact the in the last quarter mile you&#39;re running through Chamonix proper
and everyone&#39;s cheering you on, even before 8 in the morning.&lt;/p&gt;
&lt;p&gt;This time I learned my lesson about the kind of photos you get
when you stop right at the finish line (like you&#39;re just
standing there fiddling with your watch) and ran all the way
through to finish in 37:49:49.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/utmb-finish-upper.jpg&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/utmb-finish-upper-small.jpg&quot; alt=&quot;UTMB Finish Upper&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The best race picture ever taken of me&lt;/em&gt; [official race picture]&lt;/p&gt;
&lt;h2 id=&quot;event-review&quot;&gt;Event Review &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#event-review&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;UTMB is an odd mix of very high production values—probably
the best I&#39;ve ever seen—and confusing organization.&lt;/p&gt;
&lt;p&gt;On the good side, the support is fantastic and they have done
a good job with a bunch of small things. For instance, there
is solid live tracking of where runners are that helps your
crew and then after the fact you can get really good data
about your performance, including your pace and position
at every checkpoint as well as &lt;a href=&quot;https://live.utmb.world/utmb/runners/1445&quot;&gt;links to videos of when
you went through&lt;/a&gt;.
For instance, here&#39;s me coming &lt;a href=&quot;https://videos.livetrail.net/videos/utmb/pt143_2022-08-28-05.49.30.mp4&quot;&gt;through the finish&lt;/a&gt;. Pretty glad to see I&#39;m still running at this point.
Another nice touch is that they give you a number for the
back of your pack with your name and nationality on it so
that people can talk to you when coming up from behind.
And, of course, just having it be such a giant event with
so many spectators is a great energy.&lt;/p&gt;
&lt;p&gt;On the bad side, the communications and logistics can be pretty confusing.
For example, the rules require you to carry
&amp;quot;Minimum water supply: at least 1 liter&amp;quot;. Does this mean you
need to carry 1l at all times, in which case you would actually
need to carry 2l out of the aid station so you had something to
drink, or that you just need 1l worth of bottles? Who knows.
Everyone seems to think it&#39;s the second. Another example is
that they require &amp;quot;ID – passport/ID card&amp;quot;. I&#39;m not an EU
citizen so do I need a passport? Apparently not; at least
I didn&#39;t bring one. Similarly, it was hard to learn about
the schedule for crew to be bused around.
These are small points, of course, but it&#39;s
just friction that adds up and is a bit surprising with
an event that is otherwise well run.&lt;/p&gt;
&lt;p&gt;Overall, though, this is a great event and I would definitely
encourage anyone who was interested in European ultrarunning
to check it out. Of course, it&#39;s also really hard to get into,
so I can also recommend the &lt;a href=&quot;https://innsbruckalpine.at/?lang=en&quot;&gt;Innsbruck Alpine Trail Festival&lt;/a&gt;,
which is similar-ish terrain (though easier) and you can just sign up.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This feels like one of those races that could have gone better.
It&#39;s possible that 34:29 was optimistic and certainly I ran some
of it with other people with strong track records who were nowhere
near their target times. On the other hand, I feel like there
was a fair amount of room for improvement.&lt;/p&gt;
&lt;h3 id=&quot;nutrition&quot;&gt;Nutrition &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#nutrition&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;What went well here was that I stayed on my nutrition plan. I
planned to drink 250ml of sports drink every 30 minutes and
then eat 100cal of something else every hour,for a total of
300 cal/hr, plus whatever I consumed in aid stations. I didn&#39;t
hit that perfectly, but I was fairly close. Same with the
plan to drink Tailwind Recovery in aid stations, which worked
well, as by that point that chocolate taste was about all
I wanted. The end result was that I never really bonked at
all.&lt;/p&gt;
&lt;p&gt;On the other hand, I was nauseated a lot of the event and that
stopped my from running when I should have. I probably needed
to do some adaptation here and try to figure out how to debug
it rather than just slow down and walk through it. This is always
a tough one, but probably I needed to switch out what I was
eating earlier. I had brought mostly Spring Cannaberry, which
I usually like, but about halfway through I didn&#39;t want any
more. I had brought some Powergel Strawberry/Banana, which usually
I lose the taste for half-way (ironically, preferring Spring),
but this time I liked it at the halfway-type point and wished
I had more; unfortunately due to a snafu with my drop bag,
I only had about two of these as opposed to the 5 or so I
actually brought.&lt;/p&gt;
&lt;p&gt;I think part of the problem here was running low on salt:
Hydrixir long distance has about half as much sodium as
Tailwind (333mg/500ml as opposed to 620mg/500ml). I found
myself wanting salt and I did have salt tablets, but I don&#39;t
think I got on this early enough, and the SaltStick caps
I am using only have 215mg of sodium, so you need to take
a lot of them to catch up for that deficit. I did have some
soup early and that tasted good so it probably should have been a hint that
I needed to be more aggressive about sodium. Next time, probably
I need a schedule for salt intake.&lt;/p&gt;
&lt;p&gt;In retrospect, I wish I had just assumed I wasn&#39;t going to
eat any of the race food and just use the race drink, and then
I could have just planned all my eating and not had to think
about it. That would have reduced cognitive load.&lt;/p&gt;
&lt;h3 id=&quot;aid-stations&quot;&gt;Aid Stations &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#aid-stations&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I spent too much time in aid stations. Enough said.  I was tired,
but that&#39;s when you have to just get in and out. The data says like
1:40, and I should be able to get that down to less than an hour.&lt;/p&gt;
&lt;h3 id=&quot;pacing&quot;&gt;Pacing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#pacing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I think my pacing was pretty OK here. I felt like I did a pretty
good job of not pushing the first half too hard most of the time.
If you look at the graph of my position in the race, I was mostly flat through
the first half, and then got gradually better after Courmayeur and especially
Arnouvaz:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/utmb-place.png&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/utmb-place.png&quot; alt=&quot;UTMB Position Chart&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[From the &lt;a href=&quot;https://live.utmb.world/utmb/runners/1445&quot;&gt;UTMB site&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;This looks like pretty good pacing, although it also has me falling
further behind LiveRun&#39;s projections as time goes on rather than
as I sort of assumed, losing time initially and then holding on.&lt;/p&gt;
&lt;p&gt;There are several places I&#39;m not happy here. First, I felt like I was
working too hard on the early climbs. Basically, they were just
too steep to take it easy on. Second, I should probably have pushed
harder in the flat runnable section after Refuge Bertone then down from
La Fouly. There were reasons, but I think it would have been better
to push. These are kind of opposites, but I think that&#39;s right: you want
to take the uphills easier early and then the downhills faster to
take advantage of it.&lt;/p&gt;
&lt;p&gt;Finally, as a result of my inability to run the technical downhills
hard, I actually wasn&#39;t as tired at the end as I otherwise would
have been, which is why I was able to dig for the last kilometer or
two. I could have done that for quite a bit longer and would have
started earlier if I hadn&#39;t been worried about my footing. Maybe
this is a signal I should have pushed more of the uphills on the theory
that I could recover on the downhills, but if you&#39;re really tired
at the top, your chance of tripping goes up.&lt;/p&gt;
&lt;h3 id=&quot;training&quot;&gt;Training &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#training&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Some of this goes back to my training. I feel like my fitness
was good as evidenced by various workouts, but there are two
places where I think more specificity would have paid off.&lt;/p&gt;
&lt;p&gt;First, I wish I&#39;d done more hiking on difficult courses.
I did a lot of training at similar grade ratios (~60m/km)
but it was mostly on smooth courses where I could run the
whole thing. Even when I hiked it was mostly on courses
where I could have run. The few times I did something
really hard (&lt;a href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/&quot;&gt;Yosemite&lt;/a&gt;, Mount Diablo),
it slowed me down a lot. The result was that I wasn&#39;t able
to initially take those difficult climbs as easily as I wanted while
still making progress and then later to really push them
without getting exhausted.&lt;/p&gt;
&lt;p&gt;Second, I need to spend more time running technical downhills.
I&#39;ve gotten good enough to go fast when it&#39;s non-technical
but as soon as it got rocky or rooty, a lot of people were
going past me a lot. This was really noticeable in descents
that had a mix of single track and fire road because I&#39;d
get passed on the former and pass on the latter.&lt;/p&gt;
&lt;p&gt;On the other hand, UTMB is probably one of the few races like
this I&#39;m really going to run—just say no to TDS—so
this may not be a piece of specificity I need so much in the future.&lt;/p&gt;
&lt;h3 id=&quot;overall&quot;&gt;Overall &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/utmb/#overall&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Overall, this wasn&#39;t a big success but it could have been a lot
worse. I finished in good order and while I had some bad spots
I never had anything where I really cratered. I didn&#39;t get
injured, I&#39;m back to running a bit already, and I&#39;ve built
up a lot of fitness that I can use later in the season.
Plus, I&#39;ve got some cool UTMB gear to wear to other races.&lt;/p&gt;
&lt;p&gt;Finally, I want to really thank Chris for flying over and
crewing me. It made an enormous difference.&lt;/p&gt;
&lt;div class=&quot;img-flex-equal&quot;&gt;
  &lt;div&gt;
    &lt;img src=&quot;https://educatedguesswork.org/img/utmb-before-small.jpg&quot; /&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;https://educatedguesswork.org/img/utmb-after-small.jpg&quot; /&gt;
  &lt;/div&gt;
&lt;/div&gt;
[These photos helpfully taken by strangers with Chris Wood&#39;s phone.]
&lt;p&gt;&lt;strong&gt;Overall&lt;/strong&gt;: 37:49:49, 623/1789 finishers (838 DNF),&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>ELI15: Private Information Retrieval</title>
		<link href="https://educatedguesswork.org/posts/pir/"/>
		<updated>2022-08-30T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/pir/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/pir-overview.jpeg&quot; alt=&quot;PIR Overview Picture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In my &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy&quot;&gt;post on Safe Browsing&lt;/a&gt; I mentioned that one possible
solution to the problem of querying the Safe Browsing database is
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Private_information_retrieval&amp;amp;oldid=1068898272&quot;&gt;Private Information Retrieval (PIR)&lt;/a&gt; and then waved my hands vigorously about it
being crypto magic. In this post, I&#39;m going to attempt to explain
how PIR works with as simple math as possible. You will, however,
want to read the &lt;a href=&quot;https://educatedguesswork.org/posts/pir/&quot;&gt;Web version&lt;/a&gt; of this post because there is a fair
bit of math and I use LaTeX to render it with MathJax, which looks
bad in the newsletter version.&lt;/p&gt;
&lt;h2 id=&quot;the-pir-problem&quot;&gt;The PIR Problem &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#the-pir-problem&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic version of the PIR problem looks like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You have a server with some database $&#92;mathbb{D}$ consisting
of a set of $d$ elements $D_1, D_2, D_3, ... D_d.$&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The client wants to retrieve the $i$th element $D_i$
but doesn&#39;t want the server to know which element it retrieved.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is an obvious trivial solution&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
in which the server
sends the client the entire database and the client just looks
up the value it wants to know&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
This provides privacy but at the expense of communication cost
because you have to send the entire database. The challenge,
then, is to build a system which has involves sending less
data, has comparable privacy, and which doesn&#39;t chew up too much
computational power.&lt;/p&gt;
&lt;p&gt;There are two main flavors of PIR:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Single server schemes&lt;/li&gt;
&lt;li&gt;Multiple server schemes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The single-server schemes are designed under the assumption that the server
is malicious and use cryptographic mechanisms to protect against it.
The multiple server schemes are designed under the assumption that
some subset of the servers is non-malicious and are insecure if
all the servers misbehave. In this post, I&#39;ll be talking solely about
single-server PIR schemes; at some point in the future I might
talk about multi-server.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;a-simple-insecure-solution&quot;&gt;A Simple Insecure Solution &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#a-simple-insecure-solution&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first observation to make (due to
&lt;a href=&quot;https://www.cs.bgu.ac.il/~beimel/Papers/BIM.pdf&quot;&gt;Beimel, Ishai, and Malkin&lt;/a&gt;)
is that any single-server system must involve the server computing
some function over every element in the database. Otherwise, the
server could simply look at which elements were touched and learn
something about which were retrieved. This tells us something about
how things need to be constructed.&lt;/p&gt;
&lt;p&gt;Let&#39;s start with a solution that&#39;s insecure but can serve as
the basis for a secure solution. Take the database and arrange
it in a square arrangement (a &amp;quot;matrix&amp;quot;) like so:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
D_1 &amp;amp; D_2 &amp;amp; D_3 &#92;&#92;
D_4 &amp;amp; D_5 &amp;amp; D_6 &#92;&#92;
D_7 &amp;amp; D_8 &amp;amp; D_9 &#92;&#92;
&#92;end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;In order to make a query, the client creates a list of numbers,
that consists of all 0s except for the number corresponding
to the column in the matrix that it wants to read. For instance,
if it wants to read value $D_6$, it would send the list below
(I&#39;m writing this vertically for reasons which will become
apparent shortly).&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
0 &#92;&#92;
0 &#92;&#92;
1
&#92;end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;The server constructs its response as follows. For each row
in the matrix, it then goes column by column multiplying the
value in its database times the value in the same row
provided by the client and adds up the values for each column in
the database. This
produces a list that is the same length as the client&#39;s input,
where each value is constructed by multiplying the elements
in the matrix times the elements in the client&#39;s input. In this
case, we would then get:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
D_1 &#92;cdot 0 + D_2 &#92;cdot 0 + {&#92;color{red}D_3 &#92;cdot 1} &#92;&#92;
D_4 &#92;cdot 0 + D_5 &#92;cdot 0 + {&#92;color{red}D_6 &#92;cdot 1} &#92;&#92;
D_7 &#92;cdot 0 + D_8 &#92;cdot 0 + {&#92;color{red}D_9 &#92;cdot 1} &#92;&#92;
&#92;end{bmatrix} =
&#92;begin{bmatrix}
D_3 &#92;&#92;
D_6 &#92;&#92;
D_9 &#92;&#92;
&#92;end{bmatrix}&lt;br /&gt;
$$&lt;/p&gt;
&lt;p&gt;As you can see, what&#39;s happened here is that the 0s erase the
columns we&#39;re not interested and we&#39;re just left with a list
of the column of interest (in this case the rightmost one),
shown in read.
The client can then just read out the value of interest by
looking at the right row.&lt;/p&gt;
&lt;p&gt;Those of you who have taken linear algebra will recognize
this as conventional matrix multiplication, where we multiply
the database times the selection vector. However, you don&#39;t
need to know that in order to understand what&#39;s going on.&lt;/p&gt;
&lt;p&gt;It&#39;s worthwhile to stop and look at the properties of this
design. In the trivial solution, the server had to send
$d$ values to the client, whereas in this design the client
has to send $&#92;sqrt d$ values and the server sends $&#92;sqrt d$.
With a small database like this one, this is a trivial
improvement, but for a large database $2&#92;sqrt d$ is going
to be much smaller than $d$. The server has to perform $d$
computations, one for each value in the database;
as noted above, this is expected.
Unfortunately, this scheme is also trivially
insecure in that the server learns the column (though not the
row) that the
client is interested in so we need something fancier. The
solution lies in a technology called &amp;quot;homomorphic encryption&amp;quot;.&lt;/p&gt;
&lt;h2 id=&quot;a-more-secure-solution%3A-homomorphic-encryption&quot;&gt;A More Secure Solution: Homomorphic Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#a-more-secure-solution%3A-homomorphic-encryption&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;partially-homomorphic-encryption&quot;&gt;Partially Homomorphic Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#partially-homomorphic-encryption&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s been known for a very long time how to do &lt;em&gt;partially&lt;/em&gt; homomorphic encryption.
As a concrete example, consider the case where you encrypt some data by XORing
it with a key, i.e.,&lt;/p&gt;
&lt;p&gt;$$Ciphertext = Plaintext &#92;oplus Key$$&lt;/p&gt;
&lt;p&gt;With this system, you can have the server compute the XOR of two plaintexts
$P_1$ and $P_2$, given only the encrypted form.
The client sends:&lt;/p&gt;
&lt;p&gt;$$ (C_1, C_2) = (P_1 &#92;oplus K_1, P_2 &#92;oplus K_2)$$&lt;/p&gt;
&lt;p&gt;The server returns:&lt;/p&gt;
&lt;p&gt;$$ C_1 &#92;oplus C_2 $$&lt;/p&gt;
&lt;p&gt;Which the client XORs with $K_1 &#92;oplus K_2$, i.e.,&lt;/p&gt;
&lt;p&gt;$$P_1 &#92;oplus K_1 &#92;oplus P2_2 &#92;oplus K_2 &#92;oplus K1 &#92;oplus K_2 $$&lt;/p&gt;
&lt;p&gt;When you cancel out the keys ($A &#92;oplus A = 0$) you get:&lt;/p&gt;
&lt;p&gt;$$ P_1 &#92;oplus P_2$$&lt;/p&gt;
&lt;p&gt;The difference between &lt;em&gt;partially&lt;/em&gt; and &lt;em&gt;fully&lt;/em&gt; homomorphic encryption is that with
a partial homomorphic system you can compute some functions on encrypted data
but not others. With a fully homomorphic system you can compute any function,
whereas this system is homomorphic with respect to XOR but not (say) to multiplication.
The problem of &lt;em&gt;fully&lt;/em&gt; homomorphic encryption had been open for a long time
until Craig Gentry finally showed how to do it in 2009.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The reason that our simple approach was insecure is that the
server has to know which values in the client&#39;s list are 0
and which are 1, and so can easily determine which column
the client wants. But what if the server could perform this
computation without determining which of the client&#39;s
values was 1? &lt;a href=&quot;https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.162&amp;amp;rep=rep1&amp;amp;type=pdf&quot;&gt;Kushilevitz and Ostrovsky&lt;/a&gt; figured out
how to do this in 1997, using a technique called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Homomorphic_encryption&amp;amp;oldid=1085790826&quot;&gt;homomorphic encryption&lt;/a&gt;. A homomorphic encryption
system is one in which you can operate on encrypted data
without seeing the content of the data (see the &lt;a href=&quot;https://educatedguesswork.org/posts/pir/#partially-homomorphic-encryption&quot;&gt;sidebar&lt;/a&gt;
for some intuition on this).&lt;/p&gt;
&lt;p&gt;Specifically, we want a homomorphic encryption scheme which is
homomorphic with respect to &lt;em&gt;addition&lt;/em&gt;. I.e., if we have
two ciphertexts $E(A)$ and $E(B)$, there is some way to
compute $E(A + B)$ without knowing $A$ or $B$. All we
have to do is have the client encrypt its 1s and 0s
under a homomorphic system to which it knows the key, then
send the encrypted versions to the server. The server
can then perform the same computations as before,
except with the encrypted data.&lt;/p&gt;
&lt;p&gt;The way this works is that the client sends:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
E(0) &#92;&#92;
E(0) &#92;&#92;
E(1)
&#92;end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;The server would then compute:
$$
&#92;begin{bmatrix}
D_1 &#92;cdot E(0) + D_2 &#92;cdot E(0) + {&#92;color{red}D_3 &#92;cdot E(1)} &#92;&#92;
D_4 &#92;cdot E(0) + D_5 &#92;cdot E(0) + {&#92;color{red}D_6 &#92;cdot E(1)} &#92;&#92;
D_7 &#92;cdot E(0) + D_8 &#92;cdot E(0) + {&#92;color{red}D_9 &#92;cdot E(1)} &#92;&#92;
&#92;end{bmatrix} =
&#92;begin{bmatrix}
E(0) + E(0) + {&#92;color{red}E(D_3))} &#92;&#92;
E(0) + E(0) + {&#92;color{red}E(D_6) }&#92;&#92;
E(0) + E(0) + {&#92;color{red}E(D_9) }&#92;&#92;
&#92;end{bmatrix} =
&#92;begin{bmatrix}
E(D_3) &#92;&#92;
E(D_6) &#92;&#92;
E(D_9) &#92;&#92;
&#92;end{bmatrix}&lt;br /&gt;
$$&lt;/p&gt;
&lt;p&gt;The client receives this value, decrypts, and it&#39;s got
the result. One thing that might be sort of confusing here is that
I&#39;m showing the server both adding, and multiplying, as in:&lt;/p&gt;
&lt;p&gt;$$
D_1 &#92;cdot E(0) + D_2 &#92;cdot E(0) + D_3 &#92;cdot E(1)
$$&lt;/p&gt;
&lt;p&gt;However, because the server is multiplying the encrypted
value by a known value, it can do this just by addition,
as in:&lt;/p&gt;
&lt;p&gt;$$
E(2A) = E(A) + E(A)
$$
$$
E(3A) = E(2A) + E(A)
$$&lt;/p&gt;
&lt;p&gt;So, all you need is an addition operation. There are, of course,
tricks to make this faster. For instance, you can compute powers of
two (1, 2, 4, 8, etc.) and then just build up the final value from
those.  If you want to multiply &lt;em&gt;two&lt;/em&gt; encrypted values, e.g., $E(A) *
E(B) = E(AB)$ then you need a fancier system, but that&#39;s not required
here.&lt;/p&gt;
&lt;p&gt;Of course, finding a suitable homomorphic encryption scheme is
tricky because you want something that is cheap to compute
&lt;em&gt;and&lt;/em&gt; has a small ciphertext. The original K-O scheme used
a fairly inefficient homomorphic encryption system
and much of the work here has been in finding better systems.&lt;/p&gt;
&lt;h3 id=&quot;detail%3A-homomorphic-encryption-using-elgamal&quot;&gt;Detail: Homomorphic Encryption using ElGamal &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#detail%3A-homomorphic-encryption-using-elgamal&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You don&#39;t need to understand how to build a homomorphic encryption
algorithm in order to understand PIR, but it&#39;s sometimes helpful
to see things written out. In this section, I describe a simple
well-known scheme based on the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=ElGamal_encryption&amp;amp;oldid=1106513711&quot;&gt;ElGamal&lt;/a&gt; encryption algorithm.&lt;/p&gt;
&lt;p&gt;In the ElGamal encryption system client and server share a known
value $g$. In order to receive a message, an entity (say the client)
creates a random value $y$ and publishes $g^y$. In order to encrypt a
message $m$ to someone, you generate your own random value $x$
and then send the pair of values:&lt;/p&gt;
&lt;p&gt;$$g^x, g^{xy} &#92;cdot m$$&lt;/p&gt;
&lt;p&gt;The recipient—who recall has $x$—can then do the following computation:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Take $g^x$ from the message and raise it to $y$ to get $(g^x)^y = g^{xy}$&lt;/li&gt;
&lt;li&gt;Divide the second part of the message by $g^{xy}$ to recover $m$&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that in ordinary integer math, given $g^a$ and $g$ it&#39;s easy to compute
$a$ but we&#39;re going to be doing this in a setting
where that computation is hard, namely modulo some prime $p$.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
This is called the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Discrete_logarithm&amp;amp;oldid=1087396332&quot;&gt;discrete logarithm&lt;/a&gt; problem or just &amp;quot;discrete log&amp;quot;.
The intuition is that if you can compute $g^xy$ either by knowing $g^y$ and $x$ (which the sender does) or $g^x$ and $y$ (which the receiver does) but if you only know $g^y$ or $g^x$ you&#39;re stuck.
Everything else is pretty
much the same as normal math but just remember that part.&lt;/p&gt;
&lt;p&gt;However, it turns out that
system is homomorphic with respect to &lt;em&gt;multiplication&lt;/em&gt;, not &lt;em&gt;addition&lt;/em&gt;.
Consider the pair of ciphertexts:&lt;/p&gt;
&lt;p&gt;$$E(m_1) = (g^{x_1}, g^{x_1y}m_1)$$
$$E(m_2) = (g^{x_2}, g^{x_2y}m_2)$$&lt;/p&gt;
&lt;p&gt;If we multiply the first parts and the second parts together, we get:&lt;/p&gt;
&lt;p&gt;$$E(m_1 m_2) = E(m_1) &#92;cdot E(m_2) = (g^{x_1 + x_2}, g^{y(x_1 + x_2)}m_1m_2)$$&lt;/p&gt;
&lt;p&gt;You can decrypt this exactly as before to get $m_1m_2$.&lt;/p&gt;
&lt;p&gt;However, I said above, that we wanted something that was homomorphic
with respect to &lt;em&gt;addition&lt;/em&gt; not multiplication. The trick here is that
instead of encrypting message $m$ you instead encrypt $g^m$. Thus,
the result becomes:&lt;/p&gt;
&lt;p&gt;$$g^{m_1} &#92;cdot g^{m_2} = g^{m1 + m2}$$&lt;/p&gt;
&lt;p&gt;And you just need to take the discrete log to recover $m_1 + m_2$
But didn&#39;t I just say that taking discrete logs is hard?
Basically, this works fine as long as
the value to be retrieved is relatively short. So, for instance,
if we restrict ourselves to retrieving a single bit, then
you just need to compare against $g^0$ or $g^1$. The limit
depends a bit on computational power, but it&#39;s fairly
practical to retrieve 32-bit values with the right
algorithms (for smaller values like 8 bits you can just
build a table).&lt;/p&gt;
&lt;p&gt;To use this system in practice, the client is just going to
encrypt to itself by generating a key that it knows, but otherwise
we just use this system as-is.&lt;/p&gt;
&lt;h3 id=&quot;complexity&quot;&gt;Complexity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#complexity&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This is all pretty cool and it&#39;s better than nothing, but it&#39;s
also not very efficient: in order to retrieve a single value
you need to send $2&#92;sqrt d$ values
($&#92;sqrt d$ values in each direction)
and the values themselves are relatively large (in
basic ElGamal, from 512 bits to 8192 bits&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;). On the other hand, if the database
is large, it&#39;s still more efficient than sending the whole
database.
The database size is $d$ values, so if each value is a
single bit (as in the original K-O scheme), the breakeven point where you
send less data than you would just by sending the database
is around $512^2$ (about 260,000) entries if you
are using an efficient version of ElGamal. The situation
with the original K-O system was even worse.&lt;/p&gt;
&lt;p&gt;In terms of computational complexity, the server has to
compute over each database entry, so that&#39;s $d$ units of
work—recall that you have to compute over each value
in order to have a PIR system. The client only has to
compute the $&#92;sqrt d$ input values, then decrypt the relevant
returned values and take the discrete log, so that&#39;s fairly cheap.&lt;/p&gt;
&lt;h2 id=&quot;improvements&quot;&gt;Improvements &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#improvements&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As described in the original K-O paper, it&#39;s possible to significantly improve the basic scheme
by being a little clever.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;reusing-the-client&#39;s-vector&quot;&gt;Reusing the Client&#39;s Vector &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#reusing-the-client&#39;s-vector&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The original K-O scheme was even worse than what I have presented
above in that you could only extract single-bit values. This meant that if you
wanted to extract multiple-bit values, you naively
just repeat the protocol
for each bit, so both the database size and the PIR protocol
scale linearly with entry size.&lt;/p&gt;
&lt;p&gt;This leads to an obvious improvement: say that I want
to read values from a database where each entry is 8
bits rather than 1. As noted above, the client could
just send 8 input vectors, but why? The client&#39;s vectors
all pick out the same column in the database and they&#39;re
not specific to anything on the server side.&lt;/p&gt;
&lt;p&gt;Instead, the server can just compute its results for each of the 8
bits of the database using the same client input. The server then
sends back $8 &#92;sqrt d$ values, with the first $&#92;sqrt d$ being for the
first bit, the next $&#92;sqrt d$ for the next bit, etc. but all computed
over the same client input. This gives you a total communications
complexity of:&lt;/p&gt;
&lt;p&gt;$$
C (1 + &#92;sqrt d + b &#92;sqrt d)
$$&lt;/p&gt;
&lt;p&gt;Where $b$ is the number of bits to be extracted
and $C$ is the size of the homomorphic encryption ciphertext.
It also means
you don&#39;t need multiple round trips.
Of course, if you have a fancier scheme that lets you extract
values that are greater than one bit, then this trick becomes
less interesting. However, if you need to extract big values
that make discrete log impractical (say 100 bits) then
it becomes useful, because you can extract the value in
pieces, each of which is easy to compute discrete log on.&lt;/p&gt;
&lt;h3 id=&quot;recursion&quot;&gt;Recursion &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#recursion&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The next optimization requires a little more cleverness. Recall that the
server sends the client values corresponding to each row in the
database but that the client only cares about one of the rows.
Say we have a database that is consists of $d$ values and
so our matrix is $&#92;sqrt d$ on each side. The client sends $&#92;sqrt d$
values and the server replies with $&#92;sqrt d$ values. The client
only cares about the $i$th value in the server&#39;s response, but it can&#39;t tell the
server that because that would tell the server which row it
was interested in.&lt;/p&gt;
&lt;p&gt;The key insight here is that this itself is a PIR problem, with the
database consisting of $&#92;sqrt d$ values of length $C$. In the naive
protocol described above, the server sends the entire database
to us, but we only care about $&#92;frac {1}{&#92;sqrt d}$th of it. We can use the
same PIR scheme to request just the pieces we care about, one
at a time.&lt;/p&gt;
&lt;p&gt;But why stop there? We can keep using the same trick!
Imagine we have a really big database of $2^{48}$ entries. Then even
the second level database representing $&#92;sqrt d$ entries in the server&#39;s
response is going to be quite large, which means that the PIR problem
of extracting one element out of that vector is also expensive. But we can do the same
thing again. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Turtles_all_the_way_down&amp;amp;oldid=1102494882&quot;&gt;It&#39;s turtles all the way down!&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;further-improvements&quot;&gt;Further Improvements &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#further-improvements&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even with the optimizations above, we&#39;re still left with a system
which isn&#39;t very efficient, especially for smaller data sets,
where it&#39;s quite a bit worse than just transferring the entire
database (the advantage goes as a factor of $&#92;sqrt N$).
In the 25 years since the original Kushilevitz and Ostrovsky
paper, there has been quite a bit of work in this area.&lt;/p&gt;
&lt;p&gt;This seems to fall into a small number of buckets.&lt;/p&gt;
&lt;h3 id=&quot;improving-the-inner-loop&quot;&gt;Improving the Inner Loop &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#improving-the-inner-loop&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you step back and look at the basic design of the K-O protocol,
it looks like this (I&#39;m using the linear algebra matrix multiplication
notation here, but really the $&#92;times$ just denotes
that we are doing whatever our core operation is in criss-cross fashion,
as before:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
V_1 &amp;amp; V_2 &amp;amp; &#92;color{red}{&#92;mathbf{V_3}} &#92;&#92;
V_4 &amp;amp; V_5 &amp;amp; &#92;color{red}{&#92;mathbf{V_6}} &#92;&#92;
V_7 &amp;amp; V_8 &amp;amp; &#92;color{red}{&#92;mathbf{V_9}} &#92;&#92;
&#92;end{bmatrix}
&#92;times
&#92;begin{bmatrix}
0 &#92;&#92;
0 &#92;&#92;
&#92;color{red}{&#92;mathbf{1}} &#92;&lt;br /&gt;
&#92;end{bmatrix}
&#92;rightarrow
&#92;begin{bmatrix}
&#92;color{red}{&#92;mathbf{V_3}} &#92;&#92;
&#92;color{red}{&#92;mathbf{V_6}} &#92;&#92;
&#92;color{red}{&#92;mathbf{V_9}} &#92;&#92;
&#92;end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;In other words, the input vector supplied by the client operates
on each row of the database, picking out the column of interest
to the client and ignoring the other values (remember that rows
in the input vector correspond to columns we want to select).
The server sends back each resulting row and the client reads
the row of interest, ignoring the others. This basic structure holds whether the
operation being performed is simple multiplication (as in
our insecure example) or homomorphic encryption.&lt;/p&gt;
&lt;p&gt;This means that the cost of the system is determined by the
basic scaling properties of $2 &#92;sqrt d$ communications
cost and $d$ computational cost, but multiplied by the cost
of the homomorphic encryption system. The more efficient
the homomorphic encryption system is, the more efficient the
whole thing will be. There has been a fair amount of work
invested in finding more efficient homomorphic encryption
algorithms to plug in here.&lt;/p&gt;
&lt;h3 id=&quot;reducing-the-client&#39;s-input-vector&quot;&gt;Reducing the Client&#39;s Input Vector &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#reducing-the-client&#39;s-input-vector&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There is another cute trick we can play, that&#39;s
a natural extension of the techniques we have already
seen. Suppose that we have a homomorphic encryption
scheme that lets me:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add as many encrypted values as I want&lt;/li&gt;
&lt;li&gt;Do a single multiplication of two encrypted values&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this case, we can reduce the communication cost further,
as described by &lt;a href=&quot;http://crypto.stanford.edu/~dabo/pubs/abstracts/2dnf.html&quot;&gt;Boneh, Goh, and Nissim&lt;/a&gt;. Instead of sending a single list of encrypted values,
containing a single (encrypted) 1, the client
sends a pair of lists, each containing a single
(encrypted) 1. The server then computes the product
of each pair of values in each list, e.g.,&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
1 &#92;&#92;
0 &#92;&#92;
&#92;end{bmatrix}
&#92;begin{bmatrix}
0 &#92;&#92;
1 &#92;&#92;
&#92;end{bmatrix}
&#92;rightarrow
&#92;begin{bmatrix}
0 &amp;amp; 1&#92;&#92;
0 &amp;amp; 0 &#92;&#92;
&#92;end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;We can then lay this out in a deterministic order left to right and
top to bottom (though any rule will work) as a single
list, like so:&lt;/p&gt;
&lt;p&gt;$$
&#92;begin{bmatrix}
0 &#92;&#92;
1&#92;&#92;
0 &#92;&#92;
0 &#92;&#92;
&#92;end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;This list can then be used as the input to the standard K-O
protocol, and we&#39;ve just reduced the total number of values
the client sends from $&#92;sqrt d$ to $&#92;sqrt[4] d$ (the server
to client communication remains unchanged).
We can actually improve the situation further by changing
the structure of the database to be non-square, instead
having $&#92;sqrt{3} d$ rows and $(&#92;sqrt{3} d)^2$ columns.
In this case, the client sends two input vectors, each of which
are $&#92;sqrt{3} d$ long, the server maps them onto a
$(&#92;sqrt{3} d)^2$ long vector. It does the same criss-cross
trick as before, producing a result that is $&#92;sqrt{3} d$ long
and sends it to the client, for a total communications cost
of about $3 &#92;sqrt{3} d$.&lt;/p&gt;
&lt;h3 id=&quot;precomputation&quot;&gt;Precomputation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#precomputation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One interesting recent development in PIR is the design of
systems which use precomputation to make the PIR process
cheaper. The basic idea is that with a suitable homomorphic
algorithm the server and client can perform some initial
exchange, presumably involving some computation and the
exchange of some data (a &amp;quot;hint&amp;quot;). Once the hint has been
exchanged, the client can make individual queries much
more cheaply. This makes sense for applications
like &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy&quot;&gt;Safe Browsing&lt;/a&gt; where
the client is likely to make a lot of queries and so you
can amortize the hint.&lt;/p&gt;
&lt;p&gt;The specific precomputation techniques vary. In some designs, the
client and server perform some client-specific precomputation
and in others like &lt;a href=&quot;https://eprint.iacr.org/2022/949&quot;&gt;SimplePIR&lt;/a&gt;,
the server just does the computation itself
and distributes the hint to every client.&lt;/p&gt;
&lt;h3 id=&quot;other-designs&quot;&gt;Other Designs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#other-designs&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I&#39;ve focused here specifically on designs that follow this
K-O model, largely because they are intuitively easy to
explain. There are also designs (for instance &lt;a href=&quot;https://link.springer.com/content/pdf/10.1007%2F3-540-48910-X_28.pdf&quot;&gt;Cachin, Micali, and Stadler&lt;/a&gt;
and &lt;a href=&quot;https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=18521542DE2CAB193F9A10F27C227B14?doi=10.1.1.113.6572&amp;amp;rep=rep1&amp;amp;type=pdf&quot;&gt;Gentry and Ramzan&lt;/a&gt;)
that are based on other
structures and involve sending less data but at increased
computation cost. The math here is a lot harder—I
only somewhat understand it myself—so I&#39;m not going
to try to explain them here.&lt;/p&gt;
&lt;h2 id=&quot;the-big-picture&quot;&gt;The Big Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#the-big-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In conclusion, I&#39;d like to make two points here. First, this
is a really counterintuitive —at least to me—result:
we can allow a client to read some fraction of the server&#39;s
data without the server learning anything about which
values the client wants &lt;em&gt;and&lt;/em&gt; in a fashion more efficient
than just sending the client all the data. Hopefully,
this post gives some intuition for why that&#39;s possible,
thus rendering it less counterintuitive if not
precisely &lt;a href=&quot;https://math.stackexchange.com/questions/151782/when-is-something-obvious&quot;&gt;obvious&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Second, PIR is an immensely powerful primitive. There are
a whole pile of problems which would be much easier if
we had efficient PIR, ranging from &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy&quot;&gt;Safe Browsing&lt;/a&gt;,
to &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/&quot;&gt;messaging interoperability&lt;/a&gt;, to
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8816.html&quot;&gt;authentication for phone calls&lt;/a&gt;.
We&#39;re not yet at the point where you can just drop in
PIR the way you would drop in TLS, without really thinking
about the cost, but we &lt;em&gt;are&lt;/em&gt; getting closer to the point
where some of these applications are practical. In
fact we may already be there in some cases.&lt;/p&gt;
&lt;h2 id=&quot;acknowledgement&quot;&gt;Acknowledgement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pir/#acknowledgement&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Thanks to Henry &lt;a href=&quot;https://people.csail.mit.edu/henrycg/&quot;&gt;Corrigan-Gibbs&lt;/a&gt; for assistance with this post. All mistakes are of course mine.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is the standard leadin to this problem, as seen,
for instance, in the Wikipedia article. &lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Indeed, the &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#distribute-longer-hashes&quot;&gt;longer hashes&lt;/a&gt;
version of Safe Browsing is precisely this. &lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, it&#39;s known that it&#39;s not possible to have information
theoretic security with a single server. You have to depend on
some cryptographic assumption. There are information theoretically
secure versions of multi-server PIR, as long as some of the servers
are not malicious. &lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, yes, or on an elliptic curve or something. &lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Recall that you have to send two values for each ciphertext. &lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note:
I am using a somewhat different presentation order
which I think is easier to understand. &lt;a href=&quot;https://educatedguesswork.org/posts/pir/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Can we make Safe Browsing safer?</title>
		<link href="https://educatedguesswork.org/posts/safe-browsing-privacy/"/>
		<updated>2022-08-16T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/safe-browsing-privacy/</id>
		<content type="html">&lt;p&gt;The Web is full of bad stuff and it&#39;s the browser&#39;s job to protect you
from it as best it can.  For certain classes of attack, such as attempts
to subvert your computer, that is a conceptually straightforward matter
of hardening the browser, as described in the &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#the-web-security-guarantee&quot;&gt;Web
security guarantee&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;users can safely visit arbitrary web sites and execute scripts provided by those sites.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In practice, of course, browsers have vulnerabilities which mean
they don&#39;t always deliver on this guarantee.
However, even if you ignore browser issues, there are other classes of harm, such as phishing or fraud,
that aren&#39;t about attacking the computer but rather about attacking
the user.  Because these threats rely on users incorrectly trusting
the site, hardening the browser doesn&#39;t work; instead we want to warn
the user that they are about to do something unsafe.
The primary tool we have available for protecting against this
class of attack is to have a blocklist of dangerous sites/URLs.
The most widely used such blocklist is Google&#39;s &lt;a href=&quot;https://safebrowsing.google.com/&quot;&gt;Safe Browsing&lt;/a&gt;,
which is used by Chrome, Firefox, and Safari, and other browsers
(there are other similar services, but Safe Browsing is the
most popular).&lt;/p&gt;
&lt;h2 id=&quot;the-safe-browsing-database&quot;&gt;The Safe Browsing Database &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#the-safe-browsing-database&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In order to implement Safe Browsing, Google maintains a database of
potentially harmful sites that it collects via some unspecified
mechanism.
The Safe Browsing database&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
consists of a list of blocked strings which consist of:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Domain names or parts of domain names&lt;/li&gt;
&lt;li&gt;Domain and path prefixes, broken at path separators (&lt;code&gt;/&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Domain and paths and query paramaters&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So, for instance, for the URL &lt;code&gt;https://example.com/a/b/c&lt;/code&gt; the
database might contain &lt;code&gt;example.com&lt;/code&gt; if the whole domain was
dangerous or maybe &lt;code&gt;example.com/a/b&lt;/code&gt; if only some parts of the
domain were dangerous.  In order to check a URL, you break it down into the list of
prefixes and check all of them. If any of them match, then
the URL is dangerous. Here&#39;s the example Google gives for
the URL &lt;code&gt;http://a.b.c/1/2.html?param=1&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;a.b.c/1/2.html?param=1
a.b.c/1/2.html
a.b.c/
a.b.c/1/
b.c/1/2.html?param=1
b.c/1/2.html
b.c/
b.c/1/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If any of the substrings match, then the browser shows a warning,
like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/safe-browsing-warning.png&quot; alt=&quot;Safe Browsing Warning for Phishing&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Pretty scary, right?&lt;/p&gt;
&lt;h2 id=&quot;querying-the-database&quot;&gt;Querying the Database &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#querying-the-database&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Note: There are a number of versions of Safe Browsing.
This describes the Safe Browsing v4 protocol which is what
is currently implemented in Firefox, which I just call
Safe Browsing for convenience..&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Of course, the Safe Browsing database is on Google&#39;s servers, so the
browser needs some way to query it. The obvious thing to do is for the client
to send Google the URLs it is interested in and just get back a yes or
no answer. Safe Browsing does have an API for this,
but of course this has some obvious very serious privacy
problems, in that the server gets to learn everyone&#39;s browsing history,
which is something that many browsers &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/&quot;&gt;try to
stop&lt;/a&gt; in other contexts. AFAIK,
no major safe browsing client currently operates this way by default,
although Chrome offers a feature called &lt;a href=&quot;https://security.googleblog.com/2020/05/enhanced-safe-browsing-protection-now.html&quot;&gt;&amp;quot;enhanced safe browsing&amp;quot;&lt;/a&gt;
in which Chrome queries the Safe Browsing service directly for some URLs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When you switch to Enhanced Safe Browsing, Chrome will share additional security data directly with Google Safe Browsing to enable more accurate threat assessments. For example, Chrome will check uncommon URLs in real time to detect whether the site you are about to visit may be a phishing site. Chrome will also send a small sample of pages and suspicious downloads to help discover new threats against you and other Chrome users.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;However, this is not the default behavior.&lt;/p&gt;
&lt;p&gt;The other obvious design is to just send the entire database
to the client and let it do lookups locally. This is a reasonable
design and one which I&#39;ll consider &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#distribute-longer-hashes&quot;&gt;below&lt;/a&gt;, but it&#39;s not the
way the current system works. Instead Safe Browsing uses a design
which is intended to balance performance, privacy, and timeliness.&lt;/p&gt;
&lt;p&gt;The basic structure of the system works as follows. For each string
&lt;em&gt;S_i&lt;/em&gt; in the database, the server computes a hash &lt;em&gt;H(S_i)&lt;/em&gt;. It then
truncates each hash to 4 bytes (32 bits) and sends the truncated list
to the client, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sb-hashes.png&quot; alt=&quot;Safe browsing hashing&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The impact of this process is to compress the set of strings
somewhat, to a total size of &lt;em&gt;4I&lt;/em&gt; bytes where &lt;em&gt;I&lt;/em&gt;
is the total number of strings (there is also a system to
compress the database somewhat).
As shown in this diagram, it&#39;s possible that multiple strings
will map onto the same truncated hash (though different full hashes). As a practical matter,
this is a pretty sparse space: there are only about 2&lt;sup&gt;22&lt;/sup&gt;
(3 million) strings and there are 2&lt;sup&gt;32&lt;/sup&gt; possible truncated hashes, so
there will be approximately as many truncated hashes
as there are input strings; the full hashes are 256 bits long
and so are unique with extremely high probability.&lt;/p&gt;
&lt;p&gt;However, the cost of this
design is &lt;em&gt;false positives&lt;/em&gt;: effectively, the hash function
maps an input string onto a random 32-bit hash, and about 1/1400
of these hashes will correspond to one of the truncated hashes
that the server sends to the client.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Obviously, if the client
were to generate an error every time there was a match
this would create an unacceptable client experience,
as people would regularly encounter scary warnings.
However, this data structure does not have &lt;em&gt;false negatives&lt;/em&gt;: if
the hash prefix isn&#39;t in the list, then the hash won&#39;t be in
the full list either.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;order-of-operations&quot;&gt;Order of Operations &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#order-of-operations&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In Firefox, Safe Browsing checks proceed partly in parallel
to retrieving the URL; because the primary risk is the
user inappropriately acting on the returned Web page, it&#39;s
fine to contact the server as long as you don&#39;t display
the result. This parallelism allows for better performance.&lt;/p&gt;
&lt;p&gt;However, Firefox uses a similar mechanism
for it&#39;s Tracking Protection feature, and the purpose
of that feature is (partly) to prevent trackers from
using IP address-based tracking, so it&#39;s not even
safe to send a request to the server before checking
the blocklist. Fortunately, Tracking Protection
downloads a list of full hashes and so doesn&#39;t
need to wait for the server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Instead of generating an error, the client double-checks
the match by asking the server to send the full hashes
corresponding to the truncated hash.
In order to check a string, the client proceeds as follows.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Compute the full hash&lt;/li&gt;
&lt;li&gt;If the 32-bit hash prefix is not in the downloaded list,
then the string is OK and continue to retrieve
the URL.&lt;/li&gt;
&lt;li&gt;Otherwise, send the hash prefix to the server and ask
the server to provide the list of corresponding full
hashes with that prefix (typically just a single result).&lt;/li&gt;
&lt;li&gt;If the full hash is on the list of returned hashes, then
generate an error.&lt;/li&gt;
&lt;li&gt;Otherwise, continue to retrieve the URL.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This design has a number of advantages. First,
it means that the server doesn&#39;t need to send the client
the entire database, which is about four times larger
than the truncated database because the hashes are four
times larger (though more on this later).&lt;/p&gt;
&lt;p&gt;Second, it allows the server to quickly &lt;em&gt;retract&lt;/em&gt; inappropriately
blocklisted sites. Suppose that the server had blocklisted
a URL with hash &lt;strong&gt;XY&lt;/strong&gt; where &lt;strong&gt;X&lt;/strong&gt; is the 32-bit prefix and &lt;strong&gt;Y&lt;/strong&gt; is the
rest of the hash. The client retrieves &lt;strong&gt;X&lt;/strong&gt; as part of downloading
the database and then when it gets a match, asks for all the
hashes starting with &lt;strong&gt;X&lt;/strong&gt;. However, in the meantime, the server
has decided that &lt;strong&gt;XY&lt;/strong&gt; is OK. In this case, it can just
return an empty list and the client will continue without error.&lt;/p&gt;
&lt;p&gt;Conversely, however, the server cannot easily add new
values between client-side database updates. Because
the client never contacts the server if the prefix isn&#39;t
in the database, then the server won&#39;t have an opportunity
to add new entries unless they happen to correspond to a
prefix which is already in the database, which, as noted above,
is quite unlikely. This is somewhat unfortunate because
a lot of phishing attacks operate on the time scale
of minutes to tens of minutes and so you would need
the client to update its database unpractically frequently
in order to catch them (hence the reason for
&amp;quot;enhanced safe browsing&amp;quot;).&lt;/p&gt;
&lt;p&gt;Finally, because most of the potential hash prefixes
don&#39;t appear on the prefix list, the client mostly
doesn&#39;t need to contact the server. This improves
performance (because most URLs can be retrieved
immediately) and privacy (because the server doesn&#39;t
learn anything for most URLs). In addition, the client
can cache any full hashes it has retrieved for a given prefix for
some time, so it won&#39;t need to recontact the server
during the cache lifetime.&lt;/p&gt;
&lt;h2 id=&quot;privacy-implications&quot;&gt;Privacy Implications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#privacy-implications&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic privacy problem with the Safe Browsing is that
even though clients don&#39;t connect to the server for
&lt;em&gt;most&lt;/em&gt; URLs, they do connect for &lt;em&gt;some&lt;/em&gt; URLs. Naively,
you would expect the server to get queries for about 1/1400 of
the user&#39;s browsing history keyed by the IP address (obviously
the browser shouldn&#39;t send cookies!)
but actually this underestimates the situation in two
important ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;As described above, the browser checks multiple strings
for the same URL, with the exact number depending on
the URL. Each of these might result in a query to
the server. If we assume that there are 5-10 strings
to check per URL, we&#39;re looking at more like 1/200 to 1/400
URLs.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
The situation is even worse if you visit multiple URLs
on the same site.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This calculation assumes that the server isn&#39;t malicious.
Consider a server which wants to know whenever you
go to Facebook: it just needs to compute the hash
prefix for &lt;code&gt;facebook.com&lt;/code&gt; and publish that. When
the client queries for that prefix, it returns
a random hash (thus ensuring there is no blocking),
but the server gets to learn that the client might be going
to Facebook.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;checking-passwords&quot;&gt;Checking Passwords &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#checking-passwords&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Another general problem in this space is checking compromised
passwords. The general setting here is that there is a server
which has a list of passwords that have been in breaches,
such as &lt;a href=&quot;https://haveibeenpwned.com/&quot;&gt;HaveIBeenPwned&lt;/a&gt; and
the client wants to determine if the user&#39;s password is on the
list. Naively, you can use the same protocol for this application
as for Safe Browsing, but there are two complicating factors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The server may not want to keep the list of password
hashes secret to prevent people from learning the list
of passwords.&lt;/li&gt;
&lt;li&gt;Because some passwords are much more common than others,
the client may want to prevent the server from learning
that it has one of these passwords by sending the
corresponding hash prefix.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;An example of a technique tuned specifically for password
checking is provided in a &lt;a href=&quot;https://arxiv.org/abs/1905.13737&quot;&gt;paper&lt;/a&gt;
by Li, Pal, Ali, Sullivan, Chatterjee, and Ristenpart
which uses a combination of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Private_set_intersection&amp;amp;oldid=1081416156&quot;&gt;private set intersection&lt;/a&gt;
to prevent the client from learning the hashes
and &amp;quot;frequency smoothed&amp;quot; hash bucketing to prevent the hash
from leaking information about the client&#39;s password.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Of course, the server doesn&#39;t actually learn which URLs the
client is visiting because (1) it learn hashes and (2) the
hashes are truncated, so that there are many strings
with the same truncated hash.
Note that it&#39;s very important
that the client only request hash &lt;em&gt;prefixes&lt;/em&gt; because
if the client were to ask for the full hash, it would
be relatively straightforward for the server to determine
most of the input strings just by computing the hashes
for known URLs.&lt;/p&gt;
&lt;p&gt;However, even though there are many strings with
the same hash prefix, some of those
strings (e.g., &lt;code&gt;facebook.com&lt;/code&gt;) are more likely to
be visited by users than others (e.g., &lt;code&gt;86c0cb28d2ae2b872eb52.example&lt;/code&gt;).
An additional consideration is that a client might
need to query multiple strings associated with the same
site. For instance, if the client queries the hash
prefix for &lt;code&gt;educatedguesswork.org&lt;/code&gt; (hash=&lt;em&gt;A&lt;/em&gt;) and &lt;code&gt;educatedguesswork.org/posts/safe-browsing-privacy/&lt;/code&gt; (hash=&lt;em&gt;B&lt;/em&gt;) then it&#39;s more likely that the user is visiting
this site than a pair of unrelated sites that
happen to have hashes &lt;em&gt;A&lt;/em&gt; and &lt;em&gt;B&lt;/em&gt;. Providing a complete
analysis of the level of privacy leakage from Safe Browsing
is fairly complicated and depends on the distribution of visits
to various sites and your prior expectations of which sites
a user is likely to visit, but suffice to say that there
is clearly some privacy leakage. Ideally, we would have
no leakage.&lt;/p&gt;
&lt;h2 id=&quot;improving-privacy&quot;&gt;Improving Privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#improving-privacy&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Trying to improve Safe Browsing and in particular address
these privacy issues is an active area of work and in
particular something that Google and Mozila have collaborated on for
quite some time.
There are three primary known approaches to improving the
privacy of this kind of system:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Proxying&lt;/li&gt;
&lt;li&gt;Use full hashes&lt;/li&gt;
&lt;li&gt;Crypto!&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I&#39;ll discuss each of these below.&lt;/p&gt;
&lt;h3 id=&quot;proxying&quot;&gt;Proxying &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#proxying&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The most obvious technique is just to
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#anonymizing-proxies&quot;&gt;proxy&lt;/a&gt; the queries to the
server. This conceals the IP address, which prevents the
server from directly linking queries to the user.
As I understand it, Apple &lt;a href=&quot;https://www.zdnet.com/article/apple-will-proxy-safe-browsing-traffic-on-ios-14-5-to-hide-user-ips-from-google/&quot;&gt;already&lt;/a&gt; proxies Safe Browsing traffic, at least for iOS.
Proxying is a nicely general technique which is simple to implement
and reason about. Indeed, one might think that we could
simplify the system by skipping the prefix list and just
having the client query the server for every string
(or more likely every full hash). This would provide
better timeliness, including the ability to quickly
add new entries, though of course at some performance cost.&lt;/p&gt;
&lt;p&gt;There are a number of subtle points, however.
First, it&#39;s important that the queries be unlinkable
from the perspective of the server. Consider what happens
if the client makes a long-term connection to the server
(through the proxy) and then proceeds to make all its
queries through that single connection. In that case,
the server might be able to use the pattern of requests
to infer the user&#39;s identity and then to connect that
to the rest of their browsing activity. For instance,
suppose user &lt;strong&gt;A&lt;/strong&gt; retrieves the following URLs:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;github.com/fuzzydunlopp fuzzydunlopp.example/edit www.instagram.com/marlo.stanfield/&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s a fair inference that the user in question is
&lt;code&gt;fuzzydonlopp&lt;/code&gt; and that they also are visiting
Marlo Stanfield&#39;s Instagram.&lt;/p&gt;
&lt;p&gt;This suggests that connection proxying systems like
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-masque-connect-udp&quot;&gt;MASQUE&lt;/a&gt;
are bad fits for this application and instead we
would be better served by message proxying systems
like &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-ohai-ohttp&quot;&gt;Oblivious HTTP&lt;/a&gt;.
In O-HTTP, each request is separately encrypted to the
server, but requests from multiple clients
are multiplexed on the same connection from the
proxy, thus making it difficult to link them
up. Even so, however, you need to worry about
timing analysis (e.g., when potentially related
requests come in close succession).&lt;/p&gt;
&lt;p&gt;A related problem is that some servers are concerned
about abuse (e.g., excessive requests). It&#39;s common
to use &lt;a href=&quot;https://raw.githubusercontent.com/IRTF-PEARG/wg-materials/master/interim-21-01/Anti-abuse_applications_of_IP.pdf&quot;&gt;IP addresses&lt;/a&gt;
for this purpose, for instance by looking for excessive
traffic for a given IP address. It&#39;s not actually
clear to me that abuse is that big a consideration
in this case because serving the query is actually
very cheap, as it&#39;s just a very small data value,
but in any case having a proxy which conceals the client&#39;s
address prevents them for being used for this purpose.
This potentially makes it harder to manage misbehaving
clients while providing service to legitimate clients.&lt;/p&gt;
&lt;p&gt;There are a variety of techniques which might be usable
for this application (e.g., &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-privacypass-architecture-06.html&quot;&gt;PrivacyPass&lt;/a&gt;),
but it&#39;s not clear how well they work in this case because
you need to design a system which provides anti-abuse without linkability
but which is also cheap enough to verify that it&#39;s not
easier to just serve the request. For instance, if you
have the choice between verifying a digital signature
and then serving the request or just serving all the requests,
it&#39;s probably better to just serve the requests: the vast
majority of requests will be valid, and for those you
need to both verify the signature &lt;em&gt;and&lt;/em&gt; serve the requests
so you have to pay both costs. Moreover, in many cases even a failed verification will be
more expensive than just serving the request.
In addition, if the proxy and the server have a relationship,
then the proxy can do some of the work of suppressing
abuse, as &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-ohai-ohttp-03.html#name-differential-treatment&quot;&gt;described&lt;/a&gt;
in the O-HTTP spec.&lt;/p&gt;
&lt;h3 id=&quot;distribute-longer-hashes&quot;&gt;Distribute Longer Hashes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#distribute-longer-hashes&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Another alternative design is to send the client longer hashes.
The false positive rate is dictated by the fraction of hashes
which correspond to blocked strings, and so just making the
hash longer makes the false positive rate lower. If you use
a sufficiently long hash, then you can make the false positive
rate acceptably low and there is no need to double check with the
server at all. This produces a much simpler system which
is both faster (because you never need to contact the server)
and more private (because the client never makes any queries
to the server which depend on your browsing history).&lt;/p&gt;
&lt;p&gt;How long a hash do you need? Safe Browsing uses 256-bit
hashes (SHA-256), but you almost certainly need less.
If you use a &lt;em&gt;b&lt;/em&gt;-bit hash and there are 2&lt;sup&gt;20&lt;/sup&gt; blocked
strings, then the chance that a randomly chosen non-blocked
string will be reported as blocked is 2&lt;sup&gt;-(b-20)&lt;/sup&gt;
If we used an 80-bit hash, then the natural rate of
false positives would be 2^&lt;sup&gt;-60&lt;/sup&gt;, which seems acceptably
low. However, this leaves open an attack in which an
attacker &lt;em&gt;deliberately&lt;/em&gt; creates a collision in order
to make a site unreachable.&lt;/p&gt;
&lt;p&gt;Consider the case where the attacker wants to block
&lt;code&gt;example.com&lt;/code&gt;. They make their own malware site and
search the space of URLs until the find one which
has the same hash as &lt;code&gt;example.com&lt;/code&gt;. They then put their
site up at that URL and wait for the server to detect
it. Once they do, then they publish the hash and suddenly
no client can go to &lt;code&gt;example.com&lt;/code&gt;. This attack doesn&#39;t
work with the current Safe Browsing design because the
client contacts the server, which uses a full hash
(though the attacker can force the client to contact
the server for &lt;code&gt;example.com&lt;/code&gt;), but it works if you
remove the double checking step.
The natural defense against this attack is to just
make the hash longer. For instance, if we were to use
a 128-bit hash, then the attacker would need to
do more like 2&lt;sup&gt;100&lt;/sup&gt; work in order to create a collision, which
is probably acceptably large.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s important to note that the privacy guarantees of this
system are better than those of the proxy system: with the
proxy, privacy depends on the proxy and the server not
colluding, whereas with longer hashes the privacy of the
system does not require trusting anyone.&lt;/p&gt;
&lt;p&gt;Of course, sending longer hashes means more communication cost:
if we use 128-bit hashes, it will probably cost about 4 times
as much to update the client. However, this is an upper
bound: in the current Safe Browsing design, the client needs
to make connections to the server in order to double check
(this is even more expensive with proxying)
and these are not necessary with longer hashes. Moreover,
those connections are in the critical path for downloading
URLs, whereas updating the hashes can be done in the background.&lt;/p&gt;
&lt;h3 id=&quot;crypto!&quot;&gt;Crypto! &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#crypto!&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Finally, we could use cryptography. This is closely
related to a well-known problem called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Private_information_retrieval&amp;amp;oldid=1068898272&quot;&gt;Private Information Retrieval&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
in which the client wants to query a database
without the server learning which database entry it is
querying. Naively, PIR is precisely what we want here,
in that it would give good privacy and yet full timeliness
(we might still want to distribute the partial hashes
to reduce the number of queries required for performance
reasons)
but the problem is that it&#39;s really hard to build a PIR
scheme that has good enough performance to be in the critical
path for a browser. For instance, in 2021,
Kogan and Corrigan-Gibbs published a system called
&lt;a href=&quot;https://people.csail.mit.edu/henrycg/pubs/checklist/&quot;&gt;Checklist&lt;/a&gt;
specifically designed for Safe Browsing, but it comes at real
costs, as described in the Checklist abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This paper presents Checklist, a system for private blocklist lookups. In Checklist, a client can determine whether a particular string appears on a server-held blocklist of strings, without leaking its string to the server. Checklist is the first blocklist-lookup system that (1) leaks no information about the client’s string to the server, (2) does not require the client to store the blocklist in its entirety, and (3) allows the server to respond to the client’s query in time sublinear in the blocklist size. To make this possible, we construct a new two-server private-information-retrieval protocol that is both asymptotically and concretely faster, in terms of server-side time, than those of prior work. We evaluate Checklist in the context of Google’s “Safe Browsing” blocklist, which all major browsers use to prevent web clients from visiting malware-hosting URLs. Today, lookups to this blocklist leak partial hashes of a subset of clients’ visited URLs to Google’s servers. We have modified Firefox to perform Safe-Browsing blocklist lookups via Checklist servers, which eliminates the leakage of partial URL hashes from the Firefox client to the blocklist servers. This privacy gain comes at the cost of increasing communication by a factor of 3.3×, and the server-side compute costs by 9.8×. Checklist reduces end-to-end server-side costs by 6.7×, compared to what would be possible with prior state-of-the-art two-server private information retrieval.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Of course, PIR schemes continue to improve (for instance,
Henzinger, Hong, Corrigan-Gibbs, Meiklejohn, and Veikuntanathan
just published a new system called &lt;a href=&quot;https://eprint.iacr.org/2022/949&quot;&gt;SimplePIR&lt;/a&gt;),
so at some point it may just be possible to swap in a PIR
system for all of this custom machinery. This has the potential
to provide the best combination of security,
timeliness, and privacy.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Safe Browsing and similar services are a key part of protecting
users on the Internet, but the current state of technology
requires us to make some compromises between
effectiveness, privacy, and timeliness. It&#39;s not clear
to me that the current design has the optimal set of
tradeoffs, but with better technology, it may also be
possible to build a system which is superior on every
dimension.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are actually several databases for different categories of
blockage. &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Effectively, this is a single hash
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Bloom_filter&amp;amp;oldid=1102259722p&quot;&gt;Bloom Filter&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I thought I remembered the Firefox sent some
random hash prefixes to the server to create
some additional deniability, but a quick skim
of the code doesn&#39;t show anything. Will update
if I learn more.
Updated 2022-08-17: &lt;a href=&quot;https://searchfox.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#406-416&quot;&gt;here&lt;/a&gt;.
Thanks to Thorin for the link. &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Another potential defense would be to have the server generate
the hash with a secret &lt;em&gt;salt&lt;/em&gt; value, thus making collisions
hard to find. However, this makes incremental updates hard
because the attacker then learns the salt. The server
could also make the problem somewhat harder by using
a large number of public salts, but this just increases
the work factor by the number of salts. &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;We
don&#39;t need private set intersection here because it&#39;s not a problem for the client
to learn the server&#39;s data even if there is not a match. &lt;a href=&quot;https://educatedguesswork.org/posts/safe-browsing-privacy/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Discovery Mechanisms for Messaging and Calling Interoperability</title>
		<link href="https://educatedguesswork.org/posts/messaging-discovery/"/>
		<updated>2022-08-04T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/messaging-discovery/</id>
		<content type="html">&lt;p&gt;As I discussed in an &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e&quot;&gt;earlier post&lt;/a&gt;, it looks like the EU [&lt;em&gt;corrected an embarassing typo that had this as UK&lt;/em&gt; -- EKR]
Digital Markets Act (DMA) is going to require
&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/(https://www.ianbrown.tech/wp-content/uploads/2022/03/Final-DMA-interoperability-text.pdf)&quot;&gt;interoperability&lt;/a&gt;
between messaging systems. That previous post focused on how to
establishing end-to-end encryption between messaging systems.
In this post I want to talk about the problem of discovering
which messaging system someone is on.&lt;/p&gt;
&lt;h2 id=&quot;identifier-portability&quot;&gt;Identifier Portability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#identifier-portability&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Many messaging systems bootstrap
off of existing identifiers in the form of of phone numbers
(jargon: &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=E.164&amp;amp;oldid=1092734805&quot;&gt;&amp;quot;E.164 number&amp;quot;&lt;/a&gt;).
Phone numbers are &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#telephone-addressing&quot;&gt;structured&lt;/a&gt;,
which means that when you place a call over the
&lt;em&gt;Public Switched Telephone Network (PSTN)&lt;/em&gt;
it incrementally routes the call via the country,
area code, etc., but from the perspective of a messaging system, they are &lt;em&gt;opaque&lt;/em&gt;
and &lt;em&gt;unstructured&lt;/em&gt;, which is to say that
the identifier &lt;code&gt;+1.415.555.0123&lt;/code&gt; might be for a user who is
on iMessage, WhatsApp, or even both. If all I have is someone&#39;s
phone number, how do I know which service to reach them on?&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;phone-numbers-as-a-shared-namespace&quot;&gt;Phone numbers as a shared namespace &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#phone-numbers-as-a-shared-namespace&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Phone numbers weren&#39;t originally designed to be a single
namespace that was shared between carriers, but rather
as a single namespace to be used by a single carrier,
the Bell System (motto: &amp;quot;One Policy, One System, Universal Service&amp;quot;).
Even then, numbers were structured, but the structure
represented the topology of the system so that you
could incrementally route calls. For instance, you could use the
area code to direct traffic to the right region followed by the
local office code to direct it to the right switch, and then
down to the right subscriber line.&lt;/p&gt;
&lt;p&gt;When the Bell System was &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Breakup_of_the_Bell_System&amp;amp;oldid=1098874467&quot;&gt;broken up&lt;/a&gt;
the breakup was done along geographic lines into
what were called &lt;em&gt;Regional Bell Operating Companies (RBOCs)&lt;/em&gt;. Because
the topology of the system was also roughly geographic—unlike,
say, the Internet, where number prefixes
do not really correspond to geographic regions—you
could at least roughly align the RBOC boundaries with the
number structure.
However, subsequently jurisdictions started to require
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Local_number_portability&amp;amp;oldid=1077125532&quot;&gt;Local Number Portability&lt;/a&gt;, which allowed you to take your number from carrier to carrier.
Thus, even if you were originally assigned a number out of Verizon&#39;s
block, you could &amp;quot;port&amp;quot; it to T-Mobile, with the result that you
have a shared namespace.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One possibility would be to simply sidestep this question
by having identifiers be scoped, either by having people say
&amp;quot;connect with me on WhatsApp at &lt;code&gt;1.415.555.0123&lt;/code&gt;&amp;quot; or by
just adding an explicit scoping parameter, so your address
is &lt;code&gt;1.415.555.0123@whatsapp.com&lt;/code&gt; (see &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#identity&quot;&gt;here&lt;/a&gt; for
more on this).
This is how e-mail works and
isn&#39;t the worst thing in the world, but does make it more
complicated to contact someone else if all you have is their
number, as well as making things confusing if they change
their preferred app.
By contrast, phone numbers are &lt;em&gt;portable&lt;/em&gt; across carriers,
which is to say that if you move from T-Mobile to Verizon
you get to keep your phone number, and I don&#39;t need to
know what carrier you have in order to call you: I just enter
the phone number. This is implemented by having a giant—well, not really
that giant, as the entire US number space is less than 10 billion numbers and so
basically fits on a USB stick&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;—&lt;a href=&quot;https://www.nationalnanpa.com/&quot;&gt;database&lt;/a&gt;
that knows which carrier is responsible for each number.
When you want to call someone, your carrier checks this
database (technical term: &amp;quot;dip&amp;quot;) to see where to route
the call.&lt;/p&gt;
&lt;p&gt;So, what if you want to have this same property for instant
messaging or video calling systems? This actually turns
out to be surprisingly complicated.&lt;/p&gt;
&lt;h2 id=&quot;phone-number-based-addressing-for-single-applications&quot;&gt;Phone Number-Based Addressing for Single Applications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#phone-number-based-addressing-for-single-applications&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Before trying to solve the problem of routing between
applications who use phone number-based addresses, it&#39;s
useful to look at the simpler problem of a single application
that uses phone numbers as addresses (e.g., WhatsApp).
Instead of using the number portability database,
which doesn&#39;t really have the information you need
here, these devices bootstrap authentication off of
SMS.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;how-does-the-pstn-authenticate-you%3F&quot;&gt;How does the PSTN authenticate you? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#how-does-the-pstn-authenticate-you%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;You might be wondering how the PSTN knows which
number is associated with a given device. Back in the
days of landline phones, the answer was simple:
each subscriber had their own literal line. I.e.,
there was a separate pair of copper wires that went
from the central office to the subscriber&#39;s house and
the switch knew which pair of wires went with each
number.&lt;/p&gt;
&lt;p&gt;Obviously this doesn&#39;t work with mobile phones. Instead,
each phone has its own cryptographic key which it uses
to authenticate to the network. When your number is
assigned to you, that key is then associated with the
number in the carrier&#39;s database. In modern phones,
that key is generally stored in a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=SIM_card&amp;amp;oldid=1100219575&quot;&gt;Subscriber Interface Module (SIM)&lt;/a&gt;,
which is a small chip embedded in a plastic card:&lt;/p&gt;
&lt;img width=&quot;200&quot; src=&quot;https://educatedguesswork.org/img/sim-card.jpg&quot; alt=&quot;SIM card&quot; /&gt;
&lt;p&gt;[From Wikipedia]&lt;/p&gt;
&lt;p&gt;The SIM card is actually what gives your phone its identity,
and if you swap SIM cards between devices, you will also
swap their numbers.&lt;/p&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The app prompts you for a password and your phone number.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The service then sends you an SMS message with
a random code.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You enter that code into the app&#39;s user interface.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This demonstrates that you can receive messages at the indicated
phone number.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This authentication mechanism relies on the assumption that
the PSTN correctly routes messages to the right
location and that nobody else can read them. When you
think about it, this is actually a bit of an odd assumption to make
at the time you are installing a messaging application that
offers stronger security than SMS, but that&#39;s actually
a surprisingly common scenario: certificate issuance on the
Web relies on the weak security properties provided by
unencrypted DNS to bootstrap up to TLS, after which the
DNS no longer needs to be trusted.&lt;/p&gt;
&lt;p&gt;The general concept
here is that you only trust the weaker system once
to form the initial association
and from then on you have strong continuity of
authentication (in some systems,
this is known as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Trust_on_first_use&amp;amp;oldid=1085537067&quot;&gt;Trust On First Use (TOFU)&lt;/a&gt;).
In both cases, you
can build supplementary mechanisms like Certificate
&lt;a href=&quot;https://certificate.transparency.dev/&quot;&gt;Certificate Transparency&lt;/a&gt;
or &lt;a href=&quot;https://transparency.dev/application/strengthen-discovery-of-encryption-keys/&quot;&gt;Key Transparency&lt;/a&gt; to detect mississuance.&lt;/p&gt;
&lt;p&gt;One natural question to ask is why the app can&#39;t just ask
the device, which, after all, knows its own phone number.
The problem is that the device can&#39;t be trusted. Remember
that what we are trying to do is to convince the &lt;em&gt;service&lt;/em&gt;
that a given device is associated with this number, and
even though the service wrote the app in question, it&#39;s very
&lt;a href=&quot;https://educatedguesswork.org/verifying-software/&quot;&gt;difficult for them to determine&lt;/a&gt;
that an attacker hasn&#39;t modified the app to lie about its number.
The SMS verification mechanism doesn&#39;t have this problem;
because it actually checks that you can receive messages,
it works even if the device and the code running on it are
totally untrusted.&lt;/p&gt;
&lt;p&gt;It&#39;s easier to see the trust relationships if we look at
what&#39;s really happening, as shown in the diagram below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/phone-number-auth.png&quot; alt=&quot;Phone number verification via SMS&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In the first phase, the user is interacting with the
application, which is what collects the password and the
phone number and sends them to the server. The server
then sends the code through the phone network to
the &lt;em&gt;device&lt;/em&gt;. The device shows it to the user, who then
gives it to the app. The app then sends it back to the
server, which is then able to confirm the code and
verify the account. Importantly, even though the server
is sending the code to the app (via the user)
the SMS channel to the phone  is &lt;em&gt;out of band&lt;/em&gt; from the app&#39;s connection to
the server. In fact, they may even be using different
technology; for instance, if you are on WiFi, then the
connection to the server will use that radio even though
the SMS comes in over the mobile telephony network.
Even if all the data is going over the mobile channel,
the IP communications from the app aren&#39;t strongly
bound to your phone number.&lt;/p&gt;
&lt;p&gt;Note that even if you don&#39;t trust the answer, if you could
ask the device for its number, you could still
skip prompting the user. However, the number
may not be available. Apple&#39;s
security and privacy policies &lt;a href=&quot;https://stackoverflow.com/questions/193182/programmatically-get-own-phone-number-in-ios&quot;&gt;forbid&lt;/a&gt; this (presumably for privacy reasons) though it appears to be
&lt;a href=&quot;https://stackoverflow.com/questions/2480288/programmatically-obtain-the-phone-number-of-the-android-phone&quot;&gt;possible&lt;/a&gt; on Android. For similar security reasons,
the app can&#39;t just reach into your SMSes—which are received
by the operating system—and grab the confirmation code,
as this would allow it to read any SMS.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
The exception here is iMessage, which uses similar techniques
to verify the phone number, but because it ships as part
of the operating system is able to do so silently, even
though Apple doesn&#39;t permit other apps to do so.&lt;/p&gt;
&lt;p&gt;Once the service has associated the user&#39;s account with their
phone number, the rest of the system is fairly straightforward
the app connects and authenticates as the user and the service
just routes messages/calls to the user; no further interaction
with the PSTN is required. It is worth noting, however, that
this has some funny results if the phone number is ever
reassigned because the service won&#39;t be notified. The result
can be that Alice has an account on some service for a
number that has been reassigned to Bob. It&#39;s hard to avoid
this situation with this kind of loose service coupling,
but of course it&#39;s not unique to the Internet: I still
get paper mail addressed to the people who lived in my house
over 20 years ago.&lt;/p&gt;
&lt;h2 id=&quot;phone-number-based-addressing-for-multiple-applications&quot;&gt;Phone Number-Based Addressing for Multiple Applications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#phone-number-based-addressing-for-multiple-applications&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic situation isn&#39;t that different when different users
use different apps, except that you not only need to determine which
device is associated with a given user but also which app they
are using. As a simplification, let&#39;s assume that everyone
just uses a single app (analogous to the situation with
mobile phones where each subscriber just has a single carrier);
We&#39;ll look at the multi-app situation &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#multiple-apps-per-user&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Consider the following three users:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;User&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;App&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Number&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;strong&gt;A&lt;/strong&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.650.555.0011&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;strong&gt;B&lt;/strong&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.415.555.0022&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Charlie&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;strong&gt;A&lt;/strong&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.510.555.0033&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;What happens if Alice gets Bob&#39;s number and wants to contact him in
App &lt;strong&gt;A&lt;/strong&gt;? The obvious thing would be for Alice to just SMS
Bob and ask &amp;quot;which app are you using?&amp;quot; She could then tell
&lt;strong&gt;A&lt;/strong&gt; to contact &amp;quot;1.415.55.0022 via app &lt;strong&gt;B&lt;/strong&gt;&amp;quot;
(assuming that &lt;strong&gt;A&lt;/strong&gt; and &lt;strong&gt;B&lt;/strong&gt;) can already talk to each
other as discussed in my &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e&quot;&gt;earlier post&lt;/a&gt;).
This will work but it&#39;s clumsy and inconvenient; what you want
is for Alice to put Bob&#39;s number into app &lt;strong&gt;A&lt;/strong&gt; and for &lt;strong&gt;A&lt;/strong&gt; to figure
things out. Unfortunately, this doesn&#39;t appear to be something that &lt;strong&gt;A&lt;/strong&gt; can do
on its own; rather, we need some additional infrastructure.&lt;/p&gt;
&lt;p&gt;I&#39;m aware of two major designs here. In the first design, you have
a directory service which knows which number is associated with
which app. In the second design, each user—or rather their
app—has to discover it out for itself.&lt;/p&gt;
&lt;h3 id=&quot;directory-services&quot;&gt;Directory Services &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#directory-services&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The obvious way to approach this is just to use the same approach as
for number portability, i.e., to have some sort of global directory
service that tells you which app to use for each number.&lt;/p&gt;
&lt;p&gt;It&#39;s possible you could directly integrate it with the
existing PSTN databases, but that&#39;s probably going to be a lot of work and it&#39;s
probably easier to just use the same kind of SMS verification we
discussed in the previous section. For instance, suppose you had a
single global directory service. When you installed the app you would
prove possession of your number to the directory service which
would then create a record mapping your number to the app you
were using. This directory can then be queried by other people,
as shown in the diagram below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/phone-number-service.png&quot; alt=&quot;A simple phone number service&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;[Update: fixed diagram -- 2022-08-04]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this example, Alice installs app &lt;strong&gt;A&lt;/strong&gt;, which automatically
contacts the directory and proves possession of her number. The
directory then creates a record mapping her number to app &lt;strong&gt;A&lt;/strong&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
When Bob wants to contact Alice, he puts her number into
app &lt;strong&gt;B&lt;/strong&gt;, which contacts the directory and finds out that
Alice uses &lt;strong&gt;A&lt;/strong&gt;. &lt;strong&gt;B&lt;/strong&gt; then uses whatever interoperability
mechanism it has with &lt;strong&gt;A&lt;/strong&gt; to establish communication.&lt;/p&gt;
&lt;p&gt;This system is obviously massively oversimplified. If we wanted
to build something real, we&#39;d need to address some important design
questions and fix some—as-yet-unsolved—privacy
issues.&lt;/p&gt;
&lt;h4 id=&quot;authentication&quot;&gt;Authentication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#authentication&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The first question we&#39;d need to address is the authentication
structure. In the design I sketched above, the directory service is
solely responsible for knowing which app a given number is associated
with, but &lt;em&gt;not&lt;/em&gt; for authenticating the user. For instance, if Alice
and Charlie both use app &lt;strong&gt;A&lt;/strong&gt; then when Bob tries to call Alice,
&lt;strong&gt;A&lt;/strong&gt; can redirect the call to Charlie. Of course, &lt;strong&gt;A&lt;/strong&gt; might run
some kind of certificate/key transparency type of system to prevent
this kind of attack, but that requires every app to engage with that.&lt;/p&gt;
&lt;p&gt;Note that the reverse is also true: when Bob calls Alice, Alice is
relying on &lt;strong&gt;B&lt;/strong&gt;&#39;s representation that it&#39;s really Bob, and &lt;strong&gt;B&lt;/strong&gt; can
lie. Moreover, it&#39;s important for Alice to check the directory to make
sure that Bob&#39;s number is actually associated with &lt;strong&gt;B&lt;/strong&gt;. Otherwise,
service &lt;strong&gt;C&lt;/strong&gt; could just claim to be speaking for Bob even if he&#39;s not
a user of app &lt;strong&gt;C&lt;/strong&gt; at all.&lt;/p&gt;
&lt;p&gt;An alternate approach would be to have a global authentication
system in which the directory issues a credential to each user
binding their number to whatever cryptographic credentials their
app uses (effectively, this is a certificate authority for
phone numbers). In this case, it wouldn&#39;t be possible for
an app to lie about user, though of course we now
have to trust the directory. The advantage of this design
would be that you only have to trust one thing and maybe
you could have better auditing and transparency
for a global service.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible to run both kinds
of systems simultaneously, where each app uses its own
authentication system internally but also is able to make
use of a global credential system. This allows for innovation
inside an app but also provides interoperability.&lt;/p&gt;
&lt;h4 id=&quot;centralization&quot;&gt;Centralization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#centralization&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Another problem with this design is that it seems to require
a centralized directory service, or at best a small number of
such services. The basic invariant here is that you need
a procedure that takes in a number and outputs the app it&#39;s
associated with. The easiest way to do that is to have a
single service. Perhaps if there were only a small number
of apps you could check them individually but if there
are tens or hundreds it&#39;s a real scalability problem (and may also be
a privacy problem, as discussed below).&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;enum&quot;&gt;ENUM &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#enum&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;For the real nerds here, there is actually an RFC documenting
a less centralized design rooted in the DNS called &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6116&quot;&gt;ENUM&lt;/a&gt;.
The idea was that you would store records in the DNS under your phone number
(hilariously, reversed, because phone numbers read left to right and DNS addresses
read right to left), so you might have &lt;code&gt;8.4.1.0.6.4.9.7.0.2.4.4.e164.arpa.&lt;/code&gt;.
This never took off for a host of reasons, and I don&#39;t think it&#39;s
really a viable option here because it requires DNS delegations
to match the phone number structure, which seems like a lot of
work for everyone involved.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are really two objections here: one about deployability
and one about network architecture. The deployability objection
is that someone has to run the service and that has to be paid
for, so who is going to do that. I tend to think that this isn&#39;t
that big an issue: this really isn&#39;t that big a service by modern
standards, and we have a reference point for what it costs to
run something similar in the form of Let&#39;s Encrypt, which
has a budget of around &lt;a href=&quot;https://projects.propublica.org/nonprofits/organizations/463344200&quot;&gt;6 million dollars&lt;/a&gt;,
with the costs scaling sublinearly. The whole premise of the
situation is that companies like Apple and Facebook will
be required to interoperate, and against that background,
this isn&#39;t really that much money.&lt;/p&gt;
&lt;p&gt;I take the network architecture objection more seriously:
yet another centralized service isn&#39;t great for the Internet.
I think there are some ways to make it somewhat less
centralized, for instance by having each app maintain
its own mirror of the database, but at the end of the day
there&#39;s a tradeoff here between the good of interoperability—assuming
you think it is good—and the bad of centralization.
I tend to think that the balance is in favor of interoperability
but it&#39;s not a slam dunk, especially if you think that there
are other architectures that would do a better job (see &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#spin&quot;&gt;below&lt;/a&gt;).&lt;/p&gt;
&lt;h4 id=&quot;privacy&quot;&gt;Privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#privacy&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Probably the biggest issue with this design is that it has
some fairly unfortunate privacy properties. Specifically
in the naive version of this design:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The directory service gets to see which app(s) a given
phone number is associated with.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s possible for ordinary users to scrape the directory
service and learn which app(s) a given user is associated
with.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The directory server gets to see every lookup
and so be able to learn who is trying to connect with who.
(This is even worse if the user has to try every possible app)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s probably possible to address some of these issues, though it&#39;s not immediately
obvious that they can be completely fixed. The rest of this section
contains some handwaving in the direction of potential solutions.
I just came up with these recently, so don&#39;t blame me if they
are horrifically broken.&lt;/p&gt;
&lt;p&gt;The last one is probably the easiest, as there are a number of
reasonably efficient &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Private_information_retrieval&amp;amp;oldid=1068898272&quot;&gt;private information
retrieval (PIR)&lt;/a&gt;
schemes for allowing a client to retrieve a single value from a server
without disclosing the value to the server. So, if we just
require those values to be retrieved over PIR (or even over
a proxy!), we can probably provide some kind of privacy
for who is connecting to who.&lt;/p&gt;
&lt;p&gt;Similarly, I think it&#39;s probably possible to prevent large-scale
scraping of user data by clients. This is a pretty typical
rate limiting problem and it&#39;s already a problem existing apps have
to face, so we could probably apply similar techniques here.
This doesn&#39;t do much to prevent learning about a single individual,
though, for instance, suppose I want to know if someone is on
WhatsApp. There seems to be an inherent tension here between allowing
seamless discovery and connection and providing privacy in this
case, so I&#39;m not sure if it&#39;s really soluble at the end of the
day.&lt;/p&gt;
&lt;p&gt;The best idea I have for the directory service getting to
see which apps a given number is associated with is to split
up the data between two servers. The idea would be that you would have two directory
servers operated by unaffiliated entities. The client would then
prove its identity to both servers (as above) and this would
give it a credential that it could use to authenticate to that
server. It would then take encrypt its app identity and send the
key to one server and the encrypted value to the other. Then
when someone wanted to contact you, they would contact both
servers and reconstruct the original value, as shown below&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/phone-number-split.png&quot; alt=&quot;Split storage for records&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;[Update: fixed diagram --2022-08-04]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This stops the servers from being able to access the entire database,
though you still need to worry about scraping attacks, either
against both servers or by one against the other, so it&#39;s not
perfect.&lt;/p&gt;
&lt;h3 id=&quot;spin&quot;&gt;SPIN &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#spin&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Recently,
&lt;a href=&quot;https://www.jdrosen.com/&quot;&gt;Jonathan Rosenberg&lt;/a&gt;,
&lt;a href=&quot;https://www.linkedin.com/in/cullen/&quot;&gt;Cullen Jennings&lt;/a&gt;,
&lt;a href=&quot;https://alissacooper.com/&quot;&gt;Alissa Cooper&lt;/a&gt;,
and &lt;a href=&quot;http://playingattheworld.blogspot.com/&quot;&gt;Jon Peterson&lt;/a&gt;—a group of
heavy hitters in real time communications if there ever was one—published
an &lt;a href=&quot;https://www.ietf.org/archive/id/draft-rosenberg-dispatch-spin-00.html&quot;&gt;alternative design called SPIN&lt;/a&gt;
for this problem. The idea is to replace the centralized server by having
each client do its own phone number mapping via SMS. I.e., when Alice&lt;br /&gt;
wants to contact Bob, her device sends an SMS to Bob&#39;s device (again,
with some unpredictable random value). Bob&#39;s device responds with the app(s)
that Bob supports and perhaps with his identities on those apps.
The reasoning here is the same as with the directory service: only
someone who could receive SMS at Bob&#39;s number could complete the
challenge, so you must be talking to Bob.&lt;/p&gt;
&lt;p&gt;Of course, this leaves us with the problem of Bob knowing who is calling,
because Alice just asserts her number. One way to address this would
be for Bob to issue a challenge in the opposite direction,
but this isn&#39;t actually what SPIN does. Instead it assumes that Alice
has obtained a credential—presumably using a similar
issuance process to the one I indicated above—that she
uses to sign her message to Bob, but that&#39;s a design choice.
If you wanted to entirely eliminate centralized infrastructure
you could certainly do that, and that&#39;s an obvious selling
point of SPIN. Even with this kind of hybrid design, the
directory service doesn&#39;t need to be available for query
and so you don&#39;t have the privacy problems I discussed above
(it also isn&#39;t in the critical path for calls, but availability
of this kind of server system seems like a mostly solved
problem at this point).&lt;/p&gt;
&lt;p&gt;Of course, the SPIN design has a number of drawbacks (in fact,
I originally started thinking about this problem because I read
the draft and I wanted to try to fix them).&lt;/p&gt;
&lt;h4 id=&quot;offline-access&quot;&gt;Offline Access &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#offline-access&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;With SPIN, you can&#39;t really do discovery of anyone who isn&#39;t online at the
same time as you (more precisely, it just stalls until they are
online and you can get the return message).
This isn&#39;t necessarily &lt;em&gt;that&lt;/em&gt; big an issue for
real-time calls because if someone isn&#39;t online then you&#39;re not
going to be able to call them anyway (though there&#39;s voicemail) but
it&#39;s a big issue for instant messaging, which is inherently
asynchronous. Jonathan Rosenberg
&lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/dispatch/CFFoX9vNXthejIPGpmLf_pmY01Y/&quot;&gt;argues&lt;/a&gt;
that mobile devices are basically always connected.  I&#39;m not sure
that this is really true, but if you want to extend to systems
which have e-mail style identifiers, then those may be on desktop
not mobile devices, so this is a drawback.
This isn&#39;t an issue for the directory service design: once
a user has registered with the directory service then anyone
can do a lookup whether you are offline or not.&lt;/p&gt;
&lt;p&gt;One partial mitigation for this might be for the operator of
each app to record (cache) phone number validations as they
happen, so that they gradually learn some of the mappings
and can resolve them immediately. For instance, once
Alice (on service &lt;strong&gt;A&lt;/strong&gt;) has discovered that Bob is on service &lt;strong&gt;B&lt;/strong&gt;,
if Charlie (also on service &lt;strong&gt;A&lt;/strong&gt;) can learn this information
from &lt;strong&gt;A&lt;/strong&gt; without a new verification stage.
This has the advantage that it&#39;s &amp;quot;soft state&amp;quot; in that things work without it,
but the disadvantage that some things work and some don&#39;t.&lt;/p&gt;
&lt;h4 id=&quot;it-(mostly)-requires-changing-the-operating-system&quot;&gt;It (mostly) requires changing the operating system &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#it-(mostly)-requires-changing-the-operating-system&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Because the SPIN design involves every client doing its own phone
number verification, people are going to get a lot of SMS messages
requiring them to verify, which is annoying. SPIN expects to
address this by having the device operating system absorb the
messages and respond for you so the user doesn&#39;t see them.
This isn&#39;t necessarily a bad idea, but it&#39;s kind of ugly and
means that people with older operating systems will have a bad
experience.&lt;/p&gt;
&lt;p&gt;Again, this isn&#39;t an issue with the directory service version
because apps can just register themselves. That version &lt;em&gt;does&lt;/em&gt;
work better if the operating system helps out with SMS verification,
but even in the worst case the user is just bothered once for
each app they use, not for each person who wants to call them.&lt;/p&gt;
&lt;h4 id=&quot;attack-resistance&quot;&gt;Attack Resistance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#attack-resistance&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As noted above, SMS routing in the PSTN isn&#39;t really that
secure, and so you have to worry about misissuance. One way
to mitigate this is to have the results of verification
published in a transparency log. This allows everyone to see
which credentials have been assigned to each number and
potentially detect misissuance. This works fine in a directory
service type system but in a system where each user does their
own verification, you might run into a scenario where an
attacker hijacked just the connection between Alice and Bob
but not between Charlie and Bob. This would need some fancier
mechanisms to detect, though we could probably design
something.&lt;/p&gt;
&lt;h4 id=&quot;privacy-2&quot;&gt;Privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#privacy-2&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As noted above, the privacy situation is largely better without
a centralized server, but there&#39;s still an issue around probing
for individual user information. I.e., Alice wants to know
which app(s) Bob has and so sends an SMS and looks at the results.
One way to address this is for Bob to have some logic that runs
on the device that determines whether to answer the query—perhaps
depending on whether Alice&#39;s number is in the contact list—though
it&#39;s not clear how easy that is to configure.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;multiple-apps-per-user&quot;&gt;Multiple Apps Per User &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#multiple-apps-per-user&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Multiple apps are a pretty straightforward extension to either of these
systems. In both cases, you can basically think of the system as
publishing a &amp;quot;record&amp;quot; attached to the phone number. I&#39;ve implicitly
assumed that the record would contain a single app, but there&#39;s no
technical reason why they can&#39;t contain a list of apps (this is slightly
more complicated in the directory service version for cryptographic
reasons, but not really that hard).&lt;/p&gt;
&lt;p&gt;The situation for the initiator is somewhat more complicated: I&#39;m
using app &lt;strong&gt;A&lt;/strong&gt; and I want to call someone and learn that they
have apps &lt;strong&gt;B&lt;/strong&gt; and &lt;strong&gt;C&lt;/strong&gt;. What now? Presumably each app is going
to have a priority list of apps it would prefer to interoperate
with (favoring itself!) and will just pick the top one. But this
can lead to some obvious problems, such as: will you get the same
app in each direction? What happens if someone installs a new
app that is more preferred? These aren&#39;t strictly discovery problems
but are definitely ergonomics issues that apps will need to work out
somehow.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Obviously this is a difficult problem without a single great solution.
I &lt;em&gt;do&lt;/em&gt; think it&#39;s possible to come up with something reasonably good here, especially
if we&#39;re willing to make some technical compromises. That&#39;s a
lot more likely if there really will be a requirement to interoperate;
while there are real technical problems, many of the problems
are around incentives (e.g., why should I run a server so some people
can talk to my users?) and regulation provides those incentives.&lt;/p&gt;
&lt;p&gt;This problem would be vastly easier
if the addresses people were using had been structured from the very beginning: as an
example, e-mail addresses already consist of a user portion and a domain
portion, and so it&#39;s easy to know where to route any given message.
But because instant messaging addresses are largely opaque, you&#39;re
stuck with clumsier solutions. On the other hand, most e-mail addresses
aren&#39;t portable—you can&#39;t take &lt;code&gt;example@gmail.com&lt;/code&gt; over to Hotmail—so
if you ever wanted that you&#39;d be back in the soup. To the best of my knowledge
there&#39;s no real way to have address portability without some kind of
routing database, either an explicit one like the DNS or my directory service,
or an implicit one like the PSTN fabric that powers SMS verification.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You can now buy 128 GB flash
drives, so this gives us 12 bytes per record. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that it &lt;em&gt;does not&lt;/em&gt; demonstrate that this device is
associated with that number. For instance, you could
have two devices, one of which is associated with that
number and one of which you are installing the device on. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, it&#39;s possible to design a system that doesn&#39;t
require full SMS access, but that&#39;s not how these
APIs work. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;See &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=555_(telephone_number)&amp;amp;oldid=1101658698&quot;&gt;here&lt;/a&gt; for why I am using 555 numbers. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In a real system, we&#39;d probably want to prevent malicious
apps on Alice&#39;s phone from registering for another app,
in what&#39;s called an &amp;quot;identity misbinding&amp;quot; attack, but
I&#39;m ignoring that here. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Update 2022-08-04:
You could also use secret sharing, but encryption has
the advantage that if the record you want to store is
large then the total size is smaller. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It might be possible to replicate this functionality in the
directory service model. Naively, Bob could just upload the
algorithm for which numbers to answer for, but this has its
own privacy problems because it leaks Bob&#39;s contact list to the service.
There may be some fancy cryptographic solution that addresses
all these privacy problems at once, but I don&#39;t have it
in my pocket. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-discovery/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Pacifica Foothills Race Report</title>
		<link href="https://educatedguesswork.org/posts/pacifica-foothills/"/>
		<updated>2022-07-25T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/pacifica-foothills/</id>
		<content type="html">&lt;p&gt;On July 17th, I raced the &lt;a href=&quot;https://insidetrail.com/calendar/pacifica-foothills-trail-run/&quot;&gt;Pacifica Foothills 30K&lt;/a&gt;. This wasn&#39;t really on my training calendar, but a colleague
decided to run it and I offered to drive her, figuring I could fit in
a catered 18 mile training run. And then at the last minute my friend
&lt;a href=&quot;https://brbrunning.com/&quot;&gt;Lisa&lt;/a&gt; decided to run the 21K, so it was
a bit of a group thing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/pacifica-foothills-map.png&quot; alt=&quot;Race Map&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/pacifica-foothills-elevation.png&quot; alt=&quot;Elevation Profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Photos from Runalyze]&lt;/p&gt;
&lt;p&gt;Because this was just wedged in and not really part of my build for
UTMB, my coach &lt;a href=&quot;https://sundogrunning.com/coaches-ian-torrence-emily-harrison-eric-senseman-ron-hammett-will-baldwin-jim-sweeney/&quot;&gt;Emily
Torrence&lt;/a&gt;
and I decided to just train through it without really tapering at all,
and just use it as a training event.
The course is laid out as a (mostly) out-and-back followed by two loops, with
a single aid station at the start/finish.  The
out-and-back is advertised as 7.5 and the each loop as 5.7 for a total
of 18.9, with a total of around 4000 ft of climbing; and the plan was
to run the out and back and first loop at typical long run pace and
then if I was feeling good I would run the last loop at marathon to
50K effort (what&#39;s called a &amp;quot;fast finish&amp;quot; run).&lt;/p&gt;
&lt;p&gt;Usually a small local race like this would be pretty laid back
but I had to fly to London right after and then from there
to Philadelphia and eventually to Utah for &lt;a href=&quot;https://ultrasignup.com/register.aspx?did=88003&quot;&gt;Tushars 70K&lt;/a&gt;,
so I had to pack all that stuff up beforehand, and then
rush home afterwards to get to the airport, making the logistics
a bit complicated.&lt;/p&gt;
&lt;p&gt;The race was small and things are pretty chill at the race start and I managed to get
into the bathroom right before the gun went off. It helped
that they actually started a few minute late, so even though
I got out of the bathroom at about 8:29 I had a few minutes
to get set. The race starts out climbing and I&#39;m usually a pretty fast climber
so I decided to start out pretty close to the front.&lt;/p&gt;
&lt;h2 id=&quot;first-out-and-back-%5B7.26-mi%2C-%2B1%2C709%2F-1%2C693-ft%2C-1%3A15%3A16%5D&quot;&gt;First Out and Back [7.26 mi, +1,709/-1,693 ft, 1:15:16] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pacifica-foothills/#first-out-and-back-%5B7.26-mi%2C-%2B1%2C709%2F-1%2C693-ft%2C-1%3A15%3A16%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;From about mile 1 it was surprisingly
rocky and technical—not ridiculous but not your typical
buttery California single-track—so I wasn&#39;t going that fast.
Even so, I was quickly passing people. I wasn&#39;t quite sure where
I was but figured I wasn&#39;t too far off the front.&lt;/p&gt;
&lt;p&gt;Eventually I settled in behind a pair of women who (spoiler alert)
turned out to be the first and second women. They were moving
just a bit slower than me but I decided to just camp out for
a little bit and keep things in the easy zone. Eventually
I started to feel like it was too slow, though, so I passed
the second woman and then the first, but she quickly re-passed
me on a downhill. Around 2ish miles the trail opened up
onto some fire road which went all the way to the top.&lt;/p&gt;
&lt;p&gt;I hadn&#39;t read the elevation profile that carefully and was expecting the top
of the hill to be halfway through the segment, but it came up quite
quickly. There&#39;s just a set of flags and some rubber bands and the idea
is you grab a rubber band that proves you went to the top (very secure!)
and then head back down. One nice thing about this out-and-back
structure is that it lets you see where you are and I counted two men
and one woman in front of me, which put me in fourth overall
and &amp;quot;on the podium&amp;quot; as they say (though there&#39;s no actual
podium in these small races).&lt;/p&gt;
&lt;p&gt;This was supposed to be a training run not a race so I was trying
to be pretty careful on the way down, which also meant I wasn&#39;t
going as fast as others. The second woman tore by me pretty quickly
and about half-way down two other men passed me as well,
putting me in the 5th male position. Even with being careful,
the terrain was a bit tricky and I rolled my left ankle
pretty far. Fortunately it was just short of real injury, so
it hurt for a minute or two but I was able to shake it off.
I didn&#39;t lose any more places on the way down.&lt;/p&gt;
&lt;p&gt;The way back was quite a bit longer than the way up, due to
a somewhat different route after the out and back segment.
It also had a few small climbs, which wouldn&#39;t ordinarily
be that big a deal but I was looking forward to the aid station.
Anyway, I rolled in in good order, refilled my bottle with Tailwind
and headed back out.&lt;/p&gt;
&lt;h2 id=&quot;loop-1-%5B5.39-mi%2C-%2B1%2C168%2F-1%2C211-ft%2C-55%3A34%5D&quot;&gt;Loop 1 [5.39 mi, +1,168/-1,211 ft, 55:34] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pacifica-foothills/#loop-1-%5B5.39-mi%2C-%2B1%2C168%2F-1%2C211-ft%2C-55%3A34%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The loop portion of the course is arranged as a .8 mi/400 ft
climb followed by a 1.7 mi/700 ft climb. Ordinarily these
wouldn&#39;t be that hard but at this point it was starting to
heat up and there wasn&#39;t much shade. Fortunately, this was
nice smooth trail so it was just a matter of grinding it
out. There are a lot of switchbacks and false summits on the
second climb so it&#39;s a bit hard to know when it really ends.&lt;/p&gt;
&lt;p&gt;I was starting to get a bit confused about my place because
some of the 21K people had started to pass us (the start
was 15 min later). At this point, though, there were three
people nearby. I know this because they were moving slowly
on the climb—including walking—where I was
climbing pretty well, so I&#39;d close in on them or even
pass them on the way up and then they&#39;d pass me on the way down.
I wasn&#39;t really racing this but it was a little difficult not
to feel competitive at this point, so I had to make some
effort to hold back. I did get a chance to see what race
people were running and the answer was &amp;quot;two 30K, one 21K&amp;quot;.&lt;/p&gt;
&lt;p&gt;At this point I was back at 5th overall/3rd male, and things were
going well but I must have been starting to feel tired because right
around the top of the loop I caught my toe and went flying. I sat on
the ground for a few seconds to check myself out, concluding that I
was uninjured and just bleeding a bit, so got back up and headed down
the hill.&lt;/p&gt;
&lt;p&gt;I would have descended reasonably cautiously anyway, but after
this was even more cautious and so the same two guys caught
me again on the downhill. I tried not to worry about it
and just cruised into the aid station. I got some more
Tailwind, grabbed a gel, told them I wasn&#39;t badly hurt,
and headed back out. I looked over and saw that the clock
was reading 2:12, which is ahead of where I expected
to be (and actually turns out to be long because
they started the clock at 8:30 and not at the actual start time).&lt;/p&gt;
&lt;h2 id=&quot;loop-2-%5B5.45-mi%2C-%2B1%2C207%2F-1%2C207-ft%2C-54%3A48%5D&quot;&gt;Loop 2 [5.45 mi, +1,207/-1,207 ft, 54:48] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pacifica-foothills/#loop-2-%5B5.45-mi%2C-%2B1%2C207%2F-1%2C207-ft%2C-54%3A48%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I was feeling pretty OK at this point and I already knew what
the course was like from here, so I decided it was time to
pick up my pace for the fast finish section. Based on my
lap times I wasn&#39;t actually going that much faster on
the climbs, but everyone else was really slowing down,
so the effect is still to have you going quite
a bit faster than the people around you.&lt;/p&gt;
&lt;p&gt;On the first climb I quickly passed the 3rd (who turned
out to be &lt;a href=&quot;https://www.trailandkale.com/author/alidixon/&quot;&gt;Alastair&lt;/a&gt;
from &lt;a href=&quot;https://www.trailandkale.com/&quot;&gt;Trail and Kale&lt;/a&gt;) and 4th man and then the
woman who had been first when I saw her but had slowed down and was now the
second woman.
In the past, both men had caught me on the first descent
so I was a bit worried that might happen again, but
I let myself open up some and so managed to stay ahead.
I spent the next climb trying to put some more time on them—and
wishing I&#39;d paid more attention to the exact profile so I knew when it would be over—and then the descent simultaneously
trying to keep my pace up and waiting for footsteps behind
me, but never heard any.&lt;/p&gt;
&lt;p&gt;I rolled into the finish with the clock reading about 3:07.
My watch read 3:05:38, and the official finish reads 3:04:51.
Lisa had already finished and she told me that I&#39;d finished
3rd, and sure enough when I talked to the guy running the
finish I was, so I picked up my 3rd place trophy and 2nd male
coaster, took a selfie, and headed home.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/pacifica-finish.png&quot; alt=&quot;Finish photo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Lisa and me at the finish]&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pacifica-foothills/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I would call this race a success. I executed on the plan
more or less exactly and came away feeling tired but not dead.
It&#39;s just a local race but I think this is the highest place
I&#39;ve ever had. That&#39;s a pretty solid outcome with no taper and running most of it
at my usual long distance run pace. Plus, I was even able to switch
gears a bit at the end, especially in comparison to my
peers: the next finisher came in at 3:09:01, so that&#39;s putting
a 4 minute lead on them over a 55 minute loop, which is a pretty big gap.
I think if I had tapered and tried to race the whole thing
I would have got in under 3:00, though I would have needed
to drop almost 8 minutes (a bit under 5%) in order to have made men&#39;s second,
which might just barely be possible.&lt;/p&gt;
&lt;p&gt;My nutrition went fine: I figure I took in about 700 calories (2.5
bottles of tailwind plus 2 gels), which is a bit light for my
usual target of 300 cal/hr, but it&#39;s OK to go into the hole a bit on something this
short.&lt;/p&gt;
&lt;p&gt;As usual, footing remains a problem, especially on technical
stuff. Neither my ankle or the fall turned into that serious
a problem but either could have been and with UTMB coming up
I really don&#39;t want to get injured; I&#39;ve raced on an injured
rib and it&#39;s no fun. It all worked out OK though.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Verifiably selecting taxpayers for random audit</title>
		<link href="https://educatedguesswork.org/posts/random-audits/"/>
		<updated>2022-07-11T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/random-audits/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: this post contains a bunch of LaTeX math notation rendered
in MathJax, but it doesn&#39;t show up right in the newsletter
version. Check out the &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits&quot;&gt;Web version&lt;/a&gt;
where they render correctly.&lt;/p&gt;
&lt;p&gt;The New York Times
&lt;a href=&quot;https://www.nytimes.com/2022/07/06/us/politics/comey-mccabe-irs-audits.html&quot;&gt;reports&lt;/a&gt;
that both James Comey and Andrew McCabe were selected for a rare kind
of IRS audit (odds of being selected about 1/20000-1/30000).
These audits are supposed to be random and the Times article focuses on the suggestion that
there was political influence on the selection:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Was it sheer coincidence that two close associates would randomly come under the scrutiny of the same audit program within two years of each other? Did something in their returns increase the chances of their being selected? Could the audits have been connected to criminal investigations pursued by the Trump Justice Department against both men, neither of whom was ever charged?&lt;/p&gt;
&lt;p&gt;Or did someone in the federal government or at the I.R.S. — an agency that at times, like under the Nixon administration, was used for political purposes but says it has imposed a range of internal controls intended to thwart anyone from improperly using its powers — corrupt the process?&lt;/p&gt;
&lt;p&gt;“Lightning strikes, and that’s unusual, and that’s what it’s like being picked for one of these audits,” said John A. Koskinen, the I.R.S. commissioner from 2013 to 2017. “The question is: Does lightning then strike again in the same area? Does it happen? Some people may see that in their lives, but most will not — so you don’t need to be an anti-Trumper to look at this and think it’s suspicious.”&lt;/p&gt;
&lt;p&gt;How taxpayers get selected for the program of intensive audits — known as the National Research Program — is closely held. The I.R.S. is prohibited by law from discussing specific cases, further walling off from scrutiny the type of audit Mr. Comey and Mr. McCabe faced.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I don&#39;t have any particular insight on this particular case. Obviously, the chance
of these particular people both being selected are very small, but there are
a lot of people that former President Trump didn&#39;t like and so the chances
that some of them will be selected for audits are reasonably high, so it&#39;s
a bit difficult to develop the right probabilistic intuition for this,
though TheUpshot gives it a &lt;a href=&quot;https://www.nytimes.com/2022/07/07/upshot/comey-mccabe-tax-audits.html&quot;&gt;valiant try&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;From my perspective, however, the underlying problem is that because the
process is opaque, we don&#39;t have confidence that the selection is random.
What we&#39;d really like to have is a system that is provably random. Sounds like
a job for cryptography! This post is an attempt to think through the problem,
both as an interesting exercise in itself and as an example of how to
think through the requirements for this kind of system and then build it up
in pieces.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important disclaimer&lt;/strong&gt;: I just wrote this up and it hasn&#39;t
been analyzed by anyone else—or really by me—so it quite
possibly has grievous flaws that I have not identified.&lt;/p&gt;
&lt;h2 id=&quot;verifiable-random-selection&quot;&gt;Verifiable Random Selection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/random-audits/#verifiable-random-selection&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Let&#39;s start with a simpler version of this problem: we have a public
list of names $&#92;mathbb{N}$ of size $n$ consisting of names $N_1, N_2,
N_3... N_n$. We want to select a random subset $&#92;mathbb{A}$ (i.e.,
$&#92;mathbb{A} &#92;subset &#92;mathbb{N}$) for auditing. What we need to be able to do
is prove that that subset was randomly selected.&lt;/p&gt;
&lt;p&gt;Here&#39;s the basic approach. First, you publish the following information:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The list in a specified order so that each name is associated
with an index from $0$ to $n-1$.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A random number generation algorithm $R(s)$
that takes seed $s$ and generates values in the range $[0..n)$
with equal probability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A method for computing the seed that is (a) verifiable (b) unpredictable
at the present time and (c) not under the control of any
plausible set of people who might cheat.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first two of these are straightforward, but the last is more
complicated. We need some mechanism that&#39;s easy to explain but also
verifiably fair. One simple solution, used by the IETF for
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc3797&quot;&gt;selecting volunteers for it&#39;s nominating committee&lt;/a&gt;
is to use preexisting random numbers like lottery results or
the low order digits of stock prices. I cover some other
approaches &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#generating-random-seeds&quot;&gt;later.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Once you have the random seed $s$, things are pretty straightforward:
you run $R$ iteratively to generate numbers in the appropriate
range. Each number corresponds to a selected list entry. Typically,
you sample &lt;em&gt;without replacement&lt;/em&gt;, so if you select an entry that&#39;s already
been selected, you just generate a new number and try again.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
The code looks something like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;R.seed(s);
selected = [];

while (remaining &amp;gt; 0) {
  do {
      candidate = R.next();
  } while (candidate in selected);
  
  selected.append(candidate);
  remaining -= 1;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&#39;s worth taking a moment to see why this works. Steps 1-3 are all
fair, which is to say that if you assume a random $s$, then any
set of selected values are equally likely. This means that unless
you know $s$ in advance, it&#39;s not possible to predict who will
be selected. It&#39;s also not possible to modify the order of
the list or the detailed structure of the random number generator&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
in order to select one set of people over another.
It&#39;s critically important that steps 1-3 be run &lt;em&gt;before&lt;/em&gt; $s$ is known,
otherwise you could tamper with the list order or $R$ in order to get
the effect you want.
The jargon here is that you &lt;em&gt;commit&lt;/em&gt; to them
in advance, and that they can&#39;t be changed afterwards.
This also gives the public the opportunity to verify
that the list of names is correct and that random number generator
$R$ meets the correct requirements. For the same reason that they
need to be committed to in advance, if you allow changes—even to correct errors—after
$s$ is known it&#39;s too late because someone who detected an error
might choose to strategically disclose it or not once they saw the
outcome of the selection.&lt;/p&gt;
&lt;p&gt;This simple design has several of deficiencies which make it
less than ideal for sampling taxpayers. First, it just bootstraps off an
existing source of randomness, but why do you think you can trust
that? That&#39;s relatively easy to repair, as discussed below. More
importantly, it involves publishing the identities of every
taxpayer who might potentially be audited. This is already
not ideal, but gets even worse if you want to oversample some
set of taxpayers (e.g., those who have higher net incomes),
or exclude some people (e.g., those who didn&#39;t pay income tax).
For obvious reasons, this shouldn&#39;t be public information.
Moreover, because a lot of people have the same name, you need
to identify them somehow, and that probably means their
&lt;em&gt;social security number&lt;/em&gt; (SSN).
SSNs are a terrible identifier, but they&#39;re also very widely
used as a form of authentication—how often have you
been asked for the last 4 digits of your social as an authenticator?—so
having them be public is bad news.&lt;/p&gt;
&lt;p&gt;One possibility would be to have the list consist &lt;em&gt;just&lt;/em&gt; of SSNs. This
would ordinarily be a bad idea because, as noted above, SSN are
sensitive, but in practice a very large fraction of 9 digit numbers
are actually valid&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;: there are $10^9$ (1 billion) possible 9 digit
numbers and about 330 million people in the US, so any given random 9
digit number has about a 1/3 chance of being a valid SSN for someone
currently alive, so having a list of valid numbers isn&#39;t that
informative; it&#39;s the binding between SSNs and people&#39;s names
that is sensitive. However, it&#39;s still a problem if you want to do
weighting by income because you don&#39;t want someone who knows your
SSN to be able to infer your income bracket.&lt;/p&gt;
&lt;p&gt;Moreover, any cleartext list has the problem that the public can
determine who was audited, which seems suboptimal. We&#39;d like
a solution that allowed you to verify that selection was fair
but not who was audited. More precisely, anyone should be able
to verify that the selection was fair and people who are selected
should be able to verify that they—but nobody else—were selected.&lt;/p&gt;
&lt;h2 id=&quot;hashing-taxpayer-identities&quot;&gt;Hashing Taxpayer Identities &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/random-audits/#hashing-taxpayer-identities&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The obvious solution is just to hash
the identities. So, we start with a list of taxpayer identities
(e.g., SSNs or the pair of name and SSN) and hash each entry to
make a new list, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/taxpayer-hash.png&quot; alt=&quot;Hashing taxpayer identities&quot; /&gt;&lt;/p&gt;
&lt;p&gt;You then just select out of the original list using the method
I described above. The IRS has the original list and can therefore
easily determine who is to be audited. Anybody can verify that
that computation was done correctly, and the people who are
selected can verify their selection by hashing their identity
and seeing that it matches one of the selected hashes.&lt;/p&gt;
&lt;p&gt;Note that I&#39;ve also reordered the hashes by sorting them in
numeric order. This destroys any initial structure in the list.
If we don&#39;t do this, then people could look at which hashed
list entries had been selected and potentially learn information
about who had been selected for the audit. For instance, if
there are 150 million taxpayers and the first one audited has
index 500,000, it&#39;s unlikely it&#39;s Aaron A. Aaronson. Because
the hashes are effectively random with respect to their inputs,
just sorting the hashes numerically produces a list whose
order is unrelated to the original order.&lt;/p&gt;
&lt;p&gt;This is a simple and obvious solution, but unfortunately it&#39;s
also wrong. The problem here is that the input identity
values are low entropy and the hash is public. Because
there are only $10^9$ SSNs, it&#39;s easy to compute the hashes
for any name and all possible SSNs and just compare them
against a given hash. This costs roughly $2^{30}$ computations
per name, which is quite cheap. There are at most $2^{29}$
distinct names in the US (there are fewer than that many people and
of course many people have duplicate names),
so computing every possible name/SSN pair costs less than
$2^{60}$ computations, which is a lot but not at all out
of the realm of a dedicated attacker.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Moreover this computation
just needs to be done once and then you have the whole table.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;commitments&quot;&gt;Commitments &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/random-audits/#commitments&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I said above that part of the problem here was that the hash was public,
so what if we make it &lt;em&gt;private&lt;/em&gt; instead. One way to do this is with
what&#39;s called a &lt;em&gt;commitment&lt;/em&gt;. A commitment is like a hash, except
that it depends on an unknown secret value, so that it&#39;s not
possible to compute the commitment without knowing it. I.e.,&lt;/p&gt;
&lt;p&gt;$$
Commitment = C(secret, Message)
$$&lt;/p&gt;
&lt;p&gt;The way you use a commitment is that you publish the output of the commitment
but &lt;em&gt;not&lt;/em&gt; the secret value. Then you can prove that the commitment
matches a give message by revealing the secret value, at which
anyone can compute the commitment for themselves. Constructing
a secure commitment scheme is somewhat complicated, but you can
think of it as hashing the concatenation of the secret and the message,
e.g.,&lt;/p&gt;
&lt;p&gt;$$
C(secret, Message) = H(secret + Message)
$$&lt;/p&gt;
&lt;p&gt;A commitment-based scheme works more or less the same as a hash-based
scheme, except that the IRS generates a new secret for each user
and stores it with the input table. It then can generate the
table of commitments, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/taxpayer-commitment.png&quot; alt=&quot;Taxpayer identity commitments&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Selection of the taxpayers to be audited proceeds exactly as
with hashes. The result is that anyone can verify that the
list of hashed (committed) identifiers to be audited was
generated correctly. In order to convince a given taxpayer
that they were selected you show them their associated
secret. They can then compute the commitment themselves
and verify that it&#39;s on the selected list.&lt;/p&gt;
&lt;p&gt;This solves the problem of keeping the selected list secret,
but at the cost of verifiability. Yes, you can verify that
the right commitments were selected and a given taxpayer
can verify that they correspond to a specific commitment,
but you can&#39;t verify that the original commitments match
the right list of taxpayers. For instance, suppose that
the IRS wants to make sure it always audits Alice Atlanta.
All it has to do is make a list that mostly consists of
commitments for Alice Atlanta, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/fake-commitments.png&quot; alt=&quot;Bogus commitments&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Obviously, this greatly increases the chance that Alice will be
selected. Because all the commitments use different secrets, they are
all distinct even though they are for the same identifier,
and so it&#39;s not possible for anyone other than the IRS to
see that there are duplicate inputs. When Alice is selected,
the IRS can just reveal the relevant commitment and she
doesn&#39;t know that there were other commitments for her.&lt;/p&gt;
&lt;p&gt;One interesting thing that can happen is that Alice might be selected
&lt;em&gt;twice&lt;/em&gt; (this can just happen randomly). This isn&#39;t something that
ordinary people can detect: non-selected people just see the total
number of selectees and the IRS can just pick one of the commitments
it selected for Alice and show her and discard the other one.
Obviously, you could have an internal check that verified that
the right number of people were audited, but that&#39;s not publicly
verifiable.&lt;/p&gt;
&lt;h2 id=&quot;verifiable-random-functions&quot;&gt;Verifiable Random Functions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/random-audits/#verifiable-random-functions&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The source of the non-verifiability in the commitment approach
is that because each commitment uses a fresh secret, there
there isn&#39;t a unique mapping from identities to commitments.
Fortunately, there is a function which has the properties
we need, namely:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;There is a unique mapping from identities to commitments&lt;/li&gt;
&lt;li&gt;The mapping can&#39;t be computed by third parties&lt;/li&gt;
&lt;li&gt;The mapping can be &lt;em&gt;verified&lt;/em&gt; by third parties&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What we need is what&#39;s called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Verifiable_random_function&amp;amp;oldid=1046728155&quot;&gt;&lt;em&gt;verifiable random function&lt;/em&gt; (VRF)&lt;/a&gt;. A VRF works by having a secret key
$K_s$, a public key $K_p$, and a pair of functions $VRF()$ and $Verify()$. The
function $VRF$ outputs two values, the output value
and a proof of correctness of the output value, like so&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;$$
(Output, Proof) = VRF(K_s, Message)
$$&lt;/p&gt;
&lt;p&gt;The $Proof$ can be used as an input to the function
$Verify(K_p, Output, Proof, Message)$, which returns $True$
if and only if $Output$ and $Proof$ match the $Message$.
The result is that you can only compute the VRF if you
know $K$ but anyone can verify the VRF given the
triplet $(Output, Proof, Message)$. The details of
how to construct a VRF are out of scope for this
post, but see &lt;a href=&quot;https://www.ietf.org/archive/id/draft-irtf-cfrg-vrf-13.html&quot;&gt;Goldberg, Reyzin, Papadopoulus, and Vcelak&lt;/a&gt;
for a specification describing several VRFs.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;In this case, we would use the $Output$ as the value in
the &amp;quot;hashed&amp;quot; list used for the selection and keep the $Proof$
secret. Because the VRF is deterministic, any input value
can only correspond to one output, thus preventing the
kind of duplication attack we saw with commitments.
As before, anybody can verify that the selection
algorithm was run correctly, and you prove to the
selectee that they were on the list by giving them the
corresponding proof, which they can verify for themselves.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;oversampling&quot;&gt;Oversampling &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/random-audits/#oversampling&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Having multiple entries for a given taxpayer can be used as an attack
but is also potentially useful, for instance if you want to have
higher-income taxpayers be more likely to be audited.  One possibility
here is just to have multiple lists with different selection
probabilities, but this gets clumsy if you have a lot of different
selection levels and also reveals the distribution of the number
of taxpayers in each cohort.&lt;/p&gt;
&lt;p&gt;An alternative design is to have multiple entries. For instance,
suppose that we have two groups, &lt;strong&gt;Rich&lt;/strong&gt; and &lt;strong&gt;Poor&lt;/strong&gt; and we
want &lt;strong&gt;Rich&lt;/strong&gt; people to be selected twice as often as &lt;strong&gt;Poor&lt;/strong&gt;
people. This can easily be achieved by simply having two
entries for each &lt;strong&gt;Rich&lt;/strong&gt; person. We can&#39;t do this directly
with a VRF, but we can just have the input to the VRF be
the taxpayer&#39;s identity plus a counter. E.g., for Alice
we could have &lt;code&gt;Alice Atlanta: 1&lt;/code&gt; and &lt;code&gt;Alice Atlanta: 2&lt;/code&gt;.
This doubles the chance of selection, and when either of
these identities is selected you just have to prove to
Alice that she was selected and that the counter in question
is within the appropriate range (in this case, either 1 or 2).
Nothing stops the IRS from creating an entry for
&lt;code&gt;Alice Atlanta: 3&lt;/code&gt; but if they show it to Alice, she can
contest it because her maximum index should be &lt;code&gt;2&lt;/code&gt;, so it&#39;s
not really different from having having &lt;code&gt;Alice Schmatlanta&lt;/code&gt;;
it&#39;s just an entry that doesn&#39;t correspond to anyone.
The same strategy can be applied for any set of ratios,
though things get a bit messy if you want to (say) have
one set of taxpayers be audited 1% more than another set,
because you need them to have 100 and 101 entries respectively.&lt;/p&gt;
&lt;p&gt;One difficulty with this strategy is that it doesn&#39;t properly
handle multiple selections. For instance, we might select
&lt;em&gt;both&lt;/em&gt; &lt;code&gt;Alice Atlanta: 1&lt;/code&gt; and &lt;code&gt;Alice Atlanta: 2&lt;/code&gt;. In practice,
these audits are very rare, so this is pretty unlikely and
so it&#39;s probably easiest to just do one less audit, but
I &lt;em&gt;think&lt;/em&gt; you can solve this problem with another layer of
hashing.  Specifically, to compute the list entries you would compute&lt;/p&gt;
&lt;p&gt;$$
H(VRF(K_s, Identity) + Counter)
$$&lt;/p&gt;
&lt;p&gt;If you get a duplicate during the sampling process,
you reveal the inner VRF output
and prove that they two selected entries correspond
to two hashes with different counters. This doesn&#39;t
reveal any information about the rest of the structure
of the list. Note that the attacker can&#39;t just iterate
through hash inputs because the VRF output is high
entropy even if the identities are not.&lt;/p&gt;
&lt;h2 id=&quot;generating-random-seeds&quot;&gt;Generating Random Seeds &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/random-audits/#generating-random-seeds&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Above I sort of handwaved the random seed generation problem.  For a
number of options it&#39;s fine to depend on some sort of untrusted
source. However, you don&#39;t need an external randomness source.  The
basic idea is that you have a set of parties who get to contribute
randomness to the seed. Each party $i$ generates a random share $R_i$
and you concatenate them in some pre-determined order and
use that as the random seed.&lt;/p&gt;
&lt;p&gt;If each party generates their value independently, then as long
as at least one of the values is random, the whole output will
be random. The problem here is the word &lt;em&gt;independently&lt;/em&gt;. Suppose
that there are $n$ parties and parties $1..n-1$ all publish
their seeds. Party $n$ can then iterate through a bunch of
seeds until it finds one that produces the set of random numbers
it wants. Fortunately, we have a pre-existing tool for fixing
this, the commitment. Effectively we have a two-round protocol:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Round 1: everyone publishes their commitments to their shares $R_i$&lt;/li&gt;
&lt;li&gt;Round 2: everyone reveals $R_i$ and show that it matches the commitment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This protocol will work as long as there is at least one party
who (1) generates a random value and (2) doesn&#39;t collude with the
others by revealing their value before the commitments are published.&lt;/p&gt;
&lt;p&gt;There are, of course, a few logistical problems here: who are
the parties? What happens if they publish their commitments but
then decide not to reveal $R_i$ (for instance because they don&#39;t
like the resulting output)? These are real problems for some
instantiations of this kind of scheme, but in practice it&#39;s
probably fine to just have a small number of trusted parties
(e.g., the US Government, some of the Big 5 Accounting Firms, etc.)
who would suffer severe reputational damage if they were to cheat
or refuse to reveal their share.&lt;/p&gt;
&lt;p&gt;Another approach that people sometimes use for public
verifiability is to have people roll dice.
Cordero, Wagner, and Dill describe procedures for this in
a classic paper called &lt;a href=&quot;https://people.eecs.berkeley.edu/~daw/papers/dice-wote06.pdf&quot;&gt;The Role of Dice in Election Audits&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Note that you can use all of these systems together: you
just run them all, glue the data togetether (e.g., by
concatenating it in a predetermined order), and feed it
into the random number generator as the seed.&lt;/p&gt;
&lt;h2 id=&quot;drawbacks&quot;&gt;Drawbacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/random-audits/#drawbacks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is not a perfect system. First, like many cryptographic
systems, it&#39;s fairly complicated and the math required to convince
yourself that it behaves as advertised is way beyond most people.
On the other hand, people regularly trust their credit cards,
passwords, instant messages, and importantly tax returns to systems no more complicated than
this and that are based on pretty similar cryptographic primitives.
Moreover, the current system is totally unverifiable, so almost
anything is an improvement.&lt;/p&gt;
&lt;p&gt;From the technical side, I&#39;m aware of at least one notable
deficiency: while this system prevents the IRS from inappropriately
auditing someone, it doesn&#39;t prevent them from making sure
someone &lt;em&gt;doesn&#39;t&lt;/em&gt; get audited; all they have to do is omit
them from the list. In our original design, this was easily
detectable, but once we mask taxpayer identities with the
VRF, it&#39;s no longer possible. I&#39;m not aware of any simple way
to fix this, because you would need a list of the valid identities
to compare to, which is something I&#39;m trying to avoid. With
that said, I&#39;m not sure how serious this is: if the IRS
wants to cheat it can just not audit someone that gets selected.
You need some (non-transparent) internal procedures to detect
this case, so maybe you can use them to ensure the list
is complete as well.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/random-audits/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Taking a step back from this particular case, there are a lot
of types of data processing that have real impact on our
lives but where we just have to trust that the entities—whether
governments or corporations—are handling it correctly.
This includes a broad range applications from voting to medical records
to income taxes to your search history. In each of
these cases, mishandling of the data could lead to real harm;
even if you trust the current entity to behave correctly
there is no guarantee that they will do so in the future or
that their systems will not be compromised.&lt;/p&gt;
&lt;p&gt;The good news is that we are starting to have the technologies
to allow the public to verify that these processes are conducted
correctly. A good example from another field is
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Risk-limiting_audit&amp;amp;oldid=1087546062&quot;&gt;risk-limiting audits&lt;/a&gt;
for election verification, as pioneered by &lt;a href=&quot;https://www.stat.berkeley.edu/~stark/&quot;&gt;Philip Stark&lt;/a&gt;—which also requires some method of verifiably sampling, albeit a simpler one—
which is actually starting to be used in real elections.
In general, this is a good development: it&#39;s important to have good
policies and trustworthy institutions, but even better if we
don&#39;t have to trust them, especially in cases like this where
correct behavior is important for democratic governance.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that this algorithm isn&#39;t efficient if you want to
select a subset that&#39;s close to the size of the original
list. One alternative is to instead select the list of &lt;em&gt;excluded&lt;/em&gt;
entries. &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is not intended as a formal statement of the requirements for the RNG, but
roughly speaking you want every possible sequence to be
equiprobable over the ensemble of $s$ values. &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Ironically, this makes use of the property that SSNs are such
a terrible identifier. &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that if SSNs were just a lot longer, then this
system would be mostly OK. &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Hashing is often used in an attempt to conceal e-mail addresses,
and doesn&#39;t work &lt;a href=&quot;https://freedom-to-tinker.com/2018/04/09/four-cents-to-deanonymize-companies-reverse-hashed-email-addresses/&quot;&gt;any better&lt;/a&gt; there. &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is not the conventional presentation of VRFs
but I believe it&#39;s a little easier for non-cryptographers
to follow than the presentation in, for instance
the CFRG &lt;a href=&quot;https://www.ietf.org/archive/id/draft-irtf-cfrg-vrf-13.html&quot;&gt;VRF specification&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Intuitively, you can construct a VRF by applying
a hash to a deterministic digital signature function.
The hash becomes the output and the full signature
is the proof. &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I borrowed this general technique from
&lt;a href=&quot;https://eprint.iacr.org/2014/1004.pdf&quot;&gt;CONIKS&lt;/a&gt;,
which describes a more complicated system for assuring
unique bindings between identities and cryptographic
keys. &lt;a href=&quot;https://educatedguesswork.org/posts/random-audits/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Tenaya Loop Adventure Run 2: Redemption</title>
		<link href="https://educatedguesswork.org/posts/tenaya-loop2/"/>
		<updated>2022-07-08T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/tenaya-loop2/</id>
		<content type="html">&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tenaya-loop-map.png&quot; alt=&quot;Tenaya Map&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tenaya-loop-profile.png&quot; alt=&quot;Tenaya Profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Map and profile via &lt;a href=&quot;https://runalyze.com/&quot;&gt;Runalyze&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;Last year, my training partner &lt;a href=&quot;https://heapingbits.net/&quot;&gt;Chris Wood&lt;/a&gt;
and I &lt;a href=&quot;https://educatedguesswork.org/posts/tenaya-loop&quot;&gt;ran&lt;/a&gt; the &lt;a href=&quot;https://pantilat.wordpress.com/2013/06/03/tenaya-rim-loop/&quot;&gt;Tenaya Loop
route&lt;/a&gt;
around Yosemite. This route was pioneered by former ultrarunning and current FKT star &lt;a href=&quot;https://pantilat.wordpress.com/&quot;&gt;Leor
Pantilat&lt;/a&gt;. It turned out to be harder than we
expected, and we ended up bailing out partway through.&lt;/p&gt;
&lt;p&gt;This year I was scheduled to do &lt;a href=&quot;https://www.alpinerunning.co/old-cascadia&quot;&gt;Old Cascadia
50&lt;/a&gt; on June 18
as a warmup for &lt;a href=&quot;https://utmbmontblanc.com/en/&quot;&gt;Ultra-Trail du Mont-Blanc (UTMB)&lt;/a&gt;, but that got
rescheduled to October because of too much snow and so I decided to
take another crack at Tenaya. In the event, I had to make a last
minute trip to Brussels on the 19th, so I had to reschedule Tenaya to
Saturday June 25. Flying in from Europe on Wednesday evening and
then driving to Yosemite on Friday doesn&#39;t give you ideal
performance, but it&#39;s what we had, and I guess good prep for
how tired I expect to feel the second half of UTMB.&lt;/p&gt;
&lt;h2 id=&quot;logistics&quot;&gt;Logistics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#logistics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Last year Yosemite reservations only let you in after 5. This
year the rules are that you need a reservation if you come in
between 6 AM and 4 PM but not if you enter earlier or later,
which was convenient for me because I wanted to be on the trail before 6
to maximize light. I decided to stay at
&lt;a href=&quot;https://yosemiteriversideinn.com/&quot;&gt;Yosemite Riverside Inn&lt;/a&gt;,
which is on Highway 120 right en route to the Tenaya Lake trailhead.
It&#39;s not luxury, but it&#39;s fine.&lt;/p&gt;
&lt;p&gt;I went to bed around 8:30, got up at 3:00 and was at the trailhead by 5:00. Last year
the whole parking lot was under construction so you had to park
at the side of the road and there were no bear lockers or bathrooms,
but now it&#39;s been totally renovated and there are some reasonably
new/clean pit toilets and a whole rack of bear lockers.
This is a much better experience as you get to use the
bathroom before you start. And while you&#39;re technically
only forbidden to leave food in your car overnight—and I&#39;d
brought a &lt;a href=&quot;https://bearvault.com/product/bv500/&quot;&gt;BearVault BV500&lt;/a&gt;—it&#39;s
a lot more reassuring to have it in the lockers. The bear canister
only stops the bears from taking your food, not breaking into
your car to get it.&lt;/p&gt;
&lt;p&gt;I mentioned above, you don&#39;t need a reservation or a permit,
but you&#39;re still supposed to pay the entry fee. However,
there aren&#39;t any rangers around at 4ish when I got in or 10ish
when I left, so I still owe the National Park Service money. Call me!&lt;/p&gt;
&lt;h2 id=&quot;start-to-nevada-fall-%5B12.8-mi%2C-%2B2211%2F-4364%2C-3%3A32%5D&quot;&gt;Start to Nevada Fall [12.8 mi, +2211/-4364, 3:32] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#start-to-nevada-fall-%5B12.8-mi%2C-%2B2211%2F-4364%2C-3%3A32%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first stretch quickly climbs from the trailhead up to the top of
the whole route at just under 10,000 ft. It&#39;s flat at the very
beginning, but I only made it about a mile or two before it headed upward
and I unpacked my
poles, which I ended up using for almost the whole rest of the day.&lt;/p&gt;
&lt;p&gt;In theory this route then takes you by Cloud&#39;s Rest, but for some
reason I can&#39;t seem to read the map properly and so I
missed Cloud&#39;s Rest for the second time in a row. I think
the confusion here is that the top of the climb is right where you
turn, so I just got focused on going straight down. I did stop to put
my poles away, which was already kind of a mistake because that&#39;s when
a bunch of mosquitos decided it was time to swarm me.  This set the
pattern for the rest of the day: most times when I stopped I would
get a bunch of mosquitos on me. I had brought sunscreen but not insect
repellent, and just kept hoping that it would go away, so I ended up
alternately ignoring it and desperately trying to swipe them away
as I did whatever I stopped to do.&lt;/p&gt;
&lt;p&gt;The descent from here is pretty nice and reasonably smooth,
eventually linking up to JMT. I didn&#39;t feel as fresh for this
part as I was hoping to or as I did last year, but it didn&#39;t
go that badly. Things start to get a lot more crowded after
the JMT merge, I suspect because of people doing &lt;a href=&quot;https://www.nps.gov/yose/planyourvisit/halfdome.htm&quot;&gt;Half Dome&lt;/a&gt;,
but people are typically good about getting out of the way when they see you running down.
Pro Tip: there are some bathrooms at the &lt;a href=&quot;https://www.nps.gov/yose/planyourvisit/lyv.htm&quot;&gt;Little Yosemite Valley&lt;/a&gt;
campground.&lt;/p&gt;
&lt;h2 id=&quot;nevada-falls-past-glacier-point-and-to-the-valley-%5B23.9-mi%2C-%2B11.1-mi%2C-%2B2041%2F-3993%2C-3%3A06%5D&quot;&gt;Nevada Falls past Glacier Point and to the Valley [23.9 mi, +11.1 mi, +2041/-3993, 3:06] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#nevada-falls-past-glacier-point-and-to-the-valley-%5B23.9-mi%2C-%2B11.1-mi%2C-%2B2041%2F-3993%2C-3%3A06%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The Nevada Falls junction on this route is a bit confusing because
there is a short trail down to a vista point that you don&#39;t
take, but you &lt;em&gt;do&lt;/em&gt; go partway down JMT to another vista point
and then turn around and head up to Glacier Point. Last time
we went down way too far, but this time I just went down to
the vista point and turned around. This section is on hard
rock with a cliff face on the uphill side and there was
quite a bit of water run-off and general spray, so it was
hard to stay dry. This actually would have been nice later
in the day, but not so much at 9:30. On the other hand it was
reassuring to know there was plenty of water.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/nevada_falls1.jpg&quot; alt=&quot;The view of Nevada Falls&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/nevada_falls2.jpg&quot; alt=&quot;More of Nevada Falls&quot; /&gt;&lt;/p&gt;
&lt;p&gt;From this vista point you just turn around and head back to the
trail junction and then up the Glacier Point trail. This is
a longish uphill grind, so I got the poles back out and
headed up. As you start out on the trail, there
are a bunch of signs warning about how there is no way
to get up and back from Glacier Point except walking, there
are no rangers, no water, etc. This was slightly worrisome:
I had a filter so I didn&#39;t need water taps but the higher you get
the less there tends to be surface water, and I had already
drank about a liter out of the two liters I started with.&lt;/p&gt;
&lt;p&gt;When I got to Glacier Point there were still a fair
number of people there, which isn&#39;t surprising, as it&#39;s really
only about 4 miles (though about 3000 ft) from the Valley,
and the trail is reasonably good. As advertised, there weren&#39;t
any services, so a few photos and I headed down.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/glacier_point.jpg&quot; alt=&quot;A view from Glacier Point&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;going-down-is-the-easy-part&quot;&gt;Going down is the easy part &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#going-down-is-the-easy-part&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;One thing I&#39;ve noticed trail running at big tourist
locations like Yosemite or the Canyon is that people
are super impressed when you tear by them going downhill
(I know because they say something).
This has always felt a little odd to me because the
hard part of these events is the climbing and I&#39;m
not going down &lt;em&gt;that&lt;/em&gt; fast (&lt;a href=&quot;https://www.youtube.com/watch?v=PMMmzymPZ9U&quot;&gt;this&lt;/a&gt;
is what fast looks like). OTOH, you&#39;re mostly hiking up these
grades—or at least I am—so I guess it
doesn&#39;t look that impressive, even though
it&#39;s more effort.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Because the trail drops ~3000 feet in 4 miles, you—or
at least I—have to be pretty cautious, so I was going to run
down, but not just bomb down it. I wanted to practice
descending with poles so I kept them out. I think on balance
this made things easier: every time there&#39;s something a little
technical or sketchy you can plant your poles and use them
to stabilize. They&#39;re also useful for helping get over any rocks
or whatever you might need to jump over. I don&#39;t think I put
them away at all for the whole rest of the run.&lt;/p&gt;
&lt;p&gt;By this time, the trail was starting to get reasonably hot
and I was starting to worry about fluid. Fortunately,
about halfway down I was glad to find a little stream
that let me fill my water bottle and drink a half liter or
so and then fill it up. I don&#39;t remember this being
there last year and it gave me a little boost as
I cruised down into the Valley feeling pretty good.&lt;/p&gt;
&lt;h2 id=&quot;valley-to-yosemite-point-%5B29.45-mi%2C-%2B5.55mi%2C-%2B3461%2F-417%2C-2%3A53%5D&quot;&gt;Valley to Yosemite Point [29.45 mi, +5.55mi, +3461/-417, 2:53] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#valley-to-yosemite-point-%5B29.45-mi%2C-%2B5.55mi%2C-%2B3461%2F-417%2C-2%3A53%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Last year we didn&#39;t know any better and went over to Yosemite Lodge
to get water, but the route goes right through Camp Four which has
bathrooms and running water, so I headed there instead.  Had a
slightly bad moment when I stopped at the Information booth and asked
if there was any water in Yosemite Falls and she said &amp;quot;no&amp;quot;, but then
said &amp;quot;there&#39;s water in the falls but no tap&amp;quot; which is the answer I
actually cared about.  Anyway, I filled up all four of my bottles with
water and Tailwind (in the process discovering that I think I lost two
of my Tailwind sleeves on the trail, sorry about that!).&lt;/p&gt;
&lt;p&gt;Threw away
my trash in the nearby garbage, threw on my headphones&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and headed up the climb to Yosemite Point.
This is by far the hardest part of the route, gaining over 3000
feet in under 6 miles, with 2700 feet coming in the first 3 miles.
The trail is a lot of stair steps and stair-step like stuff, so you&#39;re
using your poles a lot.
I find the trick here is just to try to
maintain a constant pace and back off a little if you get tired, but
not actually stop. I mostly managed this, except for 5 minutes or so
when I stopped in the shade and did some pack management, swapping out
my bottles, grabbing food, putting on sunscreen, etc.  Other than that, it&#39;s just a matter
of slogging your way to the top.
Fortunately, it&#39;s not &lt;em&gt;too&lt;/em&gt; exposed, so while
it&#39;s hot, you&#39;re not just baking. On this day, it actually started
to drizzle a bit and I started to wonder if I was going to need
my rain gear, but it never really did much.&lt;/p&gt;
&lt;p&gt;The trail gets a big faint between Yosemite Falls and Yosemite Point
and last year we got a bit lost here, which is part of what
lead to bailing out. This year it was a bit easier, partly because
I had seen it before, partly because I had more time left, and
partly because I was less tired. In any case, I made it to Yosemite
Point just fine.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite_point.jpg&quot; alt=&quot;From Yosemite Point&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;yosemite-point-to-finish-%5B32.7-mi%2C-%2B3.25-mi%2C-%2B994%2F-469%2C-1%3A12%3A52%5D&quot;&gt;Yosemite Point to Finish [32.7 mi, +3.25 mi, +994/-469, 1:12:52] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#yosemite-point-to-finish-%5B32.7-mi%2C-%2B3.25-mi%2C-%2B994%2F-469%2C-1%3A12%3A52%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The next segment is flattish, taking you to the North Dome trail.
It was a relief to be on something runnable after all that climbing.
I&#39;d seen the first 2ish miles before, up to the intersection
to Porcupine Flat, where we bailed last year, and after that it
was uncharted territory.&lt;/p&gt;
&lt;p&gt;Eventually I got to the intersection with the trail to North Dome.
This is another out and back—though it seems pretty flat—I
must have been getting a little low on calories or something because
I stared at the map for a while and then managed to head out precisely
in the wrong direction, which is to say onward to the end, rather than
to the out and back to North Dome. I only realized this after I&#39;d climbed
about a mile or so and was wondering where the heck the top was; at that
point I wasn&#39;t heading back, so I just missed that view, I guess.&lt;/p&gt;
&lt;p&gt;Somewhere on this leg, but not quite sure where, I saw a bear cub cross the trail, followed
by a somewhat larger bear, potentially it&#39;s mother. They sort of
ran around for a while with one on each side of the trail, and for obvious
reasons, I wasn&#39;t excited about getting in between them. I tried making
a lot of noise, singing, etc. which I wouldn&#39;t say was super successful.
Eventually just kind of stood there until they wandered
off (sorry, no pictures!). Once they were out of sight I headed on
past trying to sing a bit to let them know I was there; this is surprisingly
hard to do at 8000 ft, and I definitely felt the altitude.&lt;/p&gt;
&lt;p&gt;From here it&#39;s about four miles of easy gradual downhill and then
another long gradual climb of about 1400 ft towards a vista point at about 41 miles.
This is the last big climb, so I relaxed a little bit and enjoyed the view.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/mt_watkins_vista.jpg&quot; alt=&quot;View from a vista point&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Things are pretty straightforward from here. There&#39;s a pretty gentle climb
put to near Tioga Road and a final vista point where I ran into a couple
of guys all set up for stargazing with chairs, tripods, and a bottle of wine
(you can see some of that in the foreground of the picture below).
We talked for a few minutes, I got one final shot of the sunset and then
headed back down.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/yosemite-sunset.jpg&quot; alt=&quot;Sunset in Yosemite&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This last part was actually the worst; the trail was a bit rocky and faint in places
and then the last mile or so is in a sort of marshy area, which meant
mosquitos. This became especially obvious when I stopped to fish through
my bag for my headlamp and they were immediately all over me in the
minute or two I spent just getting it on. From here on, though, it was
flat and easy, so I just cruised it in.&lt;/p&gt;
&lt;h2 id=&quot;nutrition&quot;&gt;Nutrition &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#nutrition&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Brought&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Consumed&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Calories&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Tailwind&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;10 + 4 in bottles&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;12?&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;2400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Gels&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Powerbars&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;M&amp;amp;Ms&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Bag&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Total&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;3200&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This seems a little on the light side: just a bit over 200 calories
an hour and I usually aim for 300. I didn&#39;t keep as careful track
as I would like but my sense is that I was on track in the beginning
but then started to fall behind once my initial Tailwind ran out
and I was just drinking out of the filter. It&#39;s not that bad to
filter into bottles, but then it&#39;s a bit of a pain to put the
Tailwind in and so you end up mostly drinking straight water
and falling behind on your calories. Refilling my bottles
with Tailwind at Camp 4 seems to have helped here,
and the atp makes that easy.&lt;/p&gt;
&lt;p&gt;Around 8-10 salt tabs at 215 mg each plus two caffeine
pills in mid-afternoon and then the early evening. The
caffeine definitely helped as the day wen on.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This went well on balance. Although it&#39;s a little hard to compare
because of more detours last year, I was as fast if not faster this
time through the same parts (6:47 versus 7:16 to the Valley Floor and
9:31 versus 10:09 to Yosemite Point) and only slightly slower overall
despite the divergent sections being much harder this time, and I felt
a lot less dead when I got to Yosemite Point and at the end. This
despite being alone, not having tapered, and in fact having flown in
from Europe 3 days before.&lt;/p&gt;
&lt;p&gt;I&#39;m increasingly getting my equipment dialed in. I was already good
with the poles on the uphill and I&#39;m starting to get the hang of using
them on the downhill, where I felt more stable than before. The trick
seems to be to just run with them in your hands and then lightly plant
them ahead of you most of the time, but then when something is tricky
you&#39;re prepared to lean on them more. This helps stabilize you if you
have to make an odd foot plant or if you slip a bit. I don&#39;t know if
it was the poles or not, but I managed to do the entire route without
falling.&lt;/p&gt;
&lt;p&gt;I&#39;m still sorting out the shoe situation. I&#39;ve been doing my long
runs in the &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/s-lab-ultra-3.html#color=37168&quot;&gt;Salomon S/LAB Ultra 3&lt;/a&gt;,
which has pretty good support. I&#39;ve mostly been racing in
the slightly lighter &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/sense-4-pro.html#color=48784&quot;&gt;Salomon Sense Pro 4&lt;/a&gt;,
which are a bit more aggressive and unsupportive. I used the Sense Pro 4s
last time and my ankles were pretty sore after, and I&#39;d been hoping
to convert to Salomon&#39;s new &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/pulsar-trail-lg7962.html#color=67159&quot;&gt;Pulsar Trail&lt;/a&gt;,
which I like but feel just a hair too wide so my feet slip around a bit—not
good for technical terrain. I&#39;m going to experiment with cinching them down
more, but hopefully the new Pulsar Trail Pro will be out soon and I
can try that. Otherwise, I think it&#39;s the Ultra 3s for UTMB.&lt;/p&gt;
&lt;p&gt;This was my first long run with my new &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/sense-pro-10.html#color=68245&quot;&gt;Salomon Sense Pro 10&lt;/a&gt;
pack. Generally, it&#39;s nice and roomy and fits well. I&#39;m still
experimenting with the pole placement: you can bungee them to
the front, which works well, but there are two positions: interior
of the bottles (on your chest) or exterior (by your arms). I used
the exterior position this time but I&#39;m thinking the interior might
be better. This event was right at the limit of what I&#39;d want to
carry: after 45 miles my shoulders were kind of sore.
My loadout here was probably slightly more than I need for UTMB:
I was carrying more or less the required kit, but also all of my
food, the filter, an emergency beacon, my heavy light (Lupine Piko)
and a spare battery, so there&#39;s probably some room to save a couple of
pounds, especially if I&#39;m willing to have a slightly less bright
light.&lt;/p&gt;
&lt;p&gt;Timing was a lot better: starting at 6 rather than 7 and when
it got dark at 8:30ish meant I was able to do the whole thing in
daylight—or at least twilight. I did pull out my headlamp
towards the end but mostly just because the footing was a bit dodgy
in the twilight, and I could have finished without it.&lt;/p&gt;
&lt;p&gt;The mosquito thing was not good: every time I stopped at all I
got swarmed and after I finished I had to really rush to get
changed and on my way. I spent the next few days slathering myself
with hydrocortisone and scratching. A lesson for next time.&lt;/p&gt;
&lt;p&gt;Overall, though, this seems like it was well executed. I kept moving
well and never really had any doubt I could finish. I dragged a bit
towards the middle due to what I think is nutrition but was feeling
good again at the end. I walked all the climbs but was able to run most of the
flats and the downhills. There were quite a few downhill sections
that were technical where a jog/walk was needed but I mostly
felt like I was moving well within the limits of the terrain,
which is what I was looking for.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overall:&lt;/strong&gt; 45.5 mi, 11237 ft, 15:04:57&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I don&#39;t usually use music for this kind of thing, in part
because it compromises your awareness, but they&#39;re
good for this kind of slow grind, especially when you&#39;re solo. &lt;a href=&quot;https://educatedguesswork.org/posts/tenaya-loop2/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>An overview of browser privacy features</title>
		<link href="https://educatedguesswork.org/posts/private-browsing/"/>
		<updated>2022-07-04T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/private-browsing/</id>
		<content type="html">&lt;p&gt;Recently I was interviewed by for an
&lt;a href=&quot;https://www.washingtonpost.com/technology/2022/06/26/abortion-online-privacy/&quot;&gt;article&lt;/a&gt;
about how to privately search for reproductive health services. During
the discussion I found myself explaining the different privacy
features available to Web users and wishing that I had something
written to point to. Hence this post.&lt;/p&gt;
&lt;h2 id=&quot;types-of-tracking&quot;&gt;Types of Tracking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#types-of-tracking&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;First, it&#39;s important to be clear about what we are trying to accomplish.
When we talk about Web tracking, there are two different kinds of tracking we are concerned about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Cross-site tracking&lt;/em&gt; of  your activity &lt;em&gt;across&lt;/em&gt; web sites (e.g., I went to
Nike and Adidas).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Same-site tracking&lt;/em&gt; of your activity at different times on the same site
(e.g., I searched on Google for &amp;quot;shoes&amp;quot; and then later
for &amp;quot;tofu&amp;quot;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mostly, when people talk about &amp;quot;Web tracking&amp;quot; they are talking about
cross-site tracking. This is clearly something that people didn&#39;t
really sign up for and doesn&#39;t really provide much direct user
benefit (we can argue about whether personalized ads are a user benefit,
but if so they&#39;re not a very large one). For this reason,
a number of browsers have started to build privacy features
designed to block cross-site tracking by default.&lt;/p&gt;
&lt;p&gt;By contrast, a lot of important Web functionality depends on the
ability to link up one visit to another (for instance, this is how you
stay logged in to your accounts between visits). Even in cases
where users don&#39;t explicitly log in, sites use information about
previous visits to personalize your experience (for instance,
to make content recommendations). This isn&#39;t to say
that all such tracking is desirable, but merely that we can&#39;t
just turn it off because users would notice and be unhappy.
This means that we need to find some way of providing privacy
in cases where users want it and not when they don&#39;t.&lt;/p&gt;
&lt;h2 id=&quot;attacker-models&quot;&gt;Attacker Models &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#attacker-models&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Most browser privacy work focuses on what&#39;s called a &lt;em&gt;Web
attacker&lt;/em&gt;.  which is to say an attacker who controls some set of Web
sites.  This is distinct from a lot of Internet security work which
assumes a &lt;em&gt;network attacker&lt;/em&gt; (see
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#the-web-vs.-internet-threat-models&quot;&gt;here&lt;/a&gt;
for more on this) who can observe all of your traffic. The main reason
for this is that it&#39;s a lot harder to defend against a network
attacker—defending against a Web attacker is hard
enough—and as we&#39;ll see &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#preventing-ip-based-tracking&quot;&gt;below&lt;/a&gt;,
we don&#39;t know how to do so cheaply.&lt;/p&gt;
&lt;h2 id=&quot;tracking-your-browsing-history&quot;&gt;Tracking Your Browsing History &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#tracking-your-browsing-history&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Consider the browsing history shown
in the diagram below, in which the user visits the sites
&lt;code&gt;a.example&lt;/code&gt;, &lt;code&gt;b.example&lt;/code&gt;, and &lt;code&gt;c.example&lt;/code&gt;. If a tracker
is present on each of those sites (this is not uncommon!) it will
be able to get an accurate picture of your browsing history, learning
which sites you visit and in which order.&lt;/p&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/private-browsing-regular.png&quot; width=&quot;120&quot; alt=&quot;Browsing history tracking&quot; /&gt;
&lt;p&gt;
&lt;/p&gt;&lt;h3 id=&quot;cookies&quot;&gt;Cookies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#cookies&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The main mechanism that sites use to track your behavior is the
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies&quot;&gt;cookie&lt;/a&gt;.
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/&quot;&gt;Recall&lt;/a&gt; that a cookie is just a piece of state that a site can set in your browser
and gets sent back to that site whenever you visit it. Because
cookies can be embedded on multiple sites, this allows the third
party to gradually build up a picture of your browsing behavior,
as I described &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/&quot;&gt;previously&lt;/a&gt;,
thus building up a more complete profile of your browsing history.
This is obviously extremely bad for user privacy.&lt;/p&gt;
&lt;p&gt;As noted above, a number of browsers—notably Firefox and
Safari— have started building in anti-tracking mechanisms to
reduce this privacy leakage. These mechanisms are concerned with
reducing &lt;em&gt;cross-site&lt;/em&gt; tracking and operate primarily by restricting
the use of third-party cookies and other cross-site state
mechanisms. The idea is that instead of allowing trackers to link up
behavior on multiple sites, they just get to see behavior on
individual sites. The state of the art here is what&#39;s called
&lt;em&gt;first-party isolation&lt;/em&gt; (FPI) (or &amp;quot;double keying&amp;quot;) which means that the browser stores cookies
separately for each top-level site (the one that appears in the URL
bar). In Firefox, this feature is called &lt;a href=&quot;https://blog.mozilla.org/security/2021/02/23/total-cookie-protection/&quot;&gt;Total Cookie Protection (TCP)&lt;/a&gt;,
and in Safari, I think it&#39;s just part of their &lt;a href=&quot;https://webkit.org/blog/category/privacy/&quot;&gt;Intelligent
Tracking Protection&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
suite.&lt;/p&gt;
&lt;p&gt;With FPI, if tracker &lt;strong&gt;T&lt;/strong&gt; appears on sites &lt;strong&gt;A&lt;/strong&gt; and
&lt;strong&gt;B&lt;/strong&gt; it will get a different set of cookies on each site.
The diagram below shows the usual situation without FPI. The client first
visits &lt;code&gt;a.example&lt;/code&gt; which incorporates an ad. Because this is the
first time the client has encountered this ad server, the client
has no cookies for it. When the server serves the ad, it also sets
cookie &lt;code&gt;1234&lt;/code&gt;. When the client later visits &lt;code&gt;b.example&lt;/code&gt;, which
uses the same ad server, the client sends the cookie &lt;code&gt;1234&lt;/code&gt; which
lets the server link up the two visits. Finally, the client
goes back to &lt;code&gt;a.example&lt;/code&gt;, which again serves an ad, and the
client sends the same cookie.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/no-fpi.png&quot; alt=&quot;Cookies without FPI&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The next diagram shows the same browsing pattern but with FPI
on. The first interaction is the same, but then when the
client goes to visit &lt;code&gt;b.example&lt;/code&gt; and loads an ad from the
ad server, it doesn&#39;t have a cookie, because cookies
for visits to &lt;code&gt;a.example&lt;/code&gt; are stored separately from
those for visits to &lt;code&gt;b.example&lt;/code&gt; (no matter which origin
the cookie is for!). Instead, the client makes the request
without a cookie and the ad server sends a new cookie
&lt;code&gt;5678&lt;/code&gt;. However, when the client goes back to &lt;code&gt;a.example&lt;/code&gt;
it sends the original &lt;code&gt;1234&lt;/code&gt; cookie. This preserves some
important functionality, such as when a web site uses multiple
domains associated with the same company (e.g.,
the site is served off of &lt;code&gt;service.example&lt;/code&gt; but has an
API on a CDN such as &lt;code&gt;service.cdn.example&lt;/code&gt;),&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
as opposed to blocking third party cookies, which would
break this kind of use case.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/fpi.png&quot; alt=&quot;Cookies with FPI&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Because FPI allows trackers to link up two visits to the same
site, but not to different sites, our original
user&#39;s browsing history would appear to the tracker
as three separate traces, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/private-browsing-antitracking.png&quot; alt=&quot;Browser history with anti-tracking&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Ideally, the tracker has no way of knowing that these traces are
all from the same browser or from different browsers. How much privacy
this provides depends on how much time you spend on site. For instance,
because people spend a lot of time on Google and Facebook,
they get a pretty good idea of your activity and interests,
and, depending on that activity, they may be able to tie
it to your personal identity.
On the other hand, if you go to a site once, then that site doesn&#39;t
learn a lot about you.&lt;/p&gt;
&lt;h3 id=&quot;other-tracking-mechanisms&quot;&gt;Other Tracking Mechanisms &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#other-tracking-mechanisms&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Unfortunately, cookies are not the only way to track users. There
are two much harder to block mechanisms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The IP address&lt;/li&gt;
&lt;li&gt;Fingerprinting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The IP address is largely tied to a given device, though devices—especially
mobile devices—can change their IP address, so it serves as a
pretty strong/stable long-term identifier. Because the IP address is
necessary for communicating with the server, there&#39;s not a whole
lot that browsers can do about it directly without relaying traffic
through some other node (more on this below).&lt;/p&gt;
&lt;p&gt;The other major non-state mechanism for tracking users is
&lt;a href=&quot;https://www.mozilla.org/en-US/firefox/features/block-fingerprinting/&quot;&gt;fingerprinting&lt;/a&gt;.
Fingerprinting exploits natural variation in the hardware
and software that users run. The Web provides a number of
APIs that allow sites to learn information about a user&#39;s
machine, such as what browser and version they are running,
what operating system it is on, what language it is set to,
and even the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/web/api/navigator/hardwareconcurrency&quot;&gt;number of logical processor cores it has&lt;/a&gt;.
Any individual value like this isn&#39;t particularly identifying,
but when you add them up they provide a significant amount
of information about user identity. Estimates of precisely
how much vary widely, but everyone agrees it&#39;s nonzero
and probably at least enough to reduce the set of possible
users by a factor of 1000 or more, depending on how unusual
a given user&#39;s configuration is.&lt;/p&gt;
&lt;p&gt;Countering fingerprinting is a difficult problem, and requires
compromising between providing maximal privacy and breaking
functionality. For instance, a number of Web APIs can be—and
&lt;a href=&quot;https://webtransparency.cs.princeton.edu/webcensus/index.html&quot;&gt;are&lt;/a&gt;—used
for fingerprinting, but they are also widely used for
non-fingerprinting purposes, so restricting their use is
difficult.&lt;/p&gt;
&lt;h2 id=&quot;private-browsing-modes&quot;&gt;Private Browsing Modes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#private-browsing-modes&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Most browsers include some kind of mode (&amp;quot;Private Browsing&amp;quot; on &lt;a href=&quot;https://support.mozilla.org/en-US/kb/private-browsing-use-firefox-without-history&quot;&gt;Firefox&lt;/a&gt;
and &lt;a href=&quot;https://support.apple.com/guide/safari/browse-privately-ibrw1069/mac&quot;&gt;Safari&lt;/a&gt;
, &amp;quot;Incognito&amp;quot; on &lt;a href=&quot;https://support.google.com/chrome/answer/95464&quot;&gt;Chrome&lt;/a&gt;) that is designed to provide a
somewhat more private experience. Historically, these modes were mostly
designed not to prevent &lt;em&gt;web tracking&lt;/em&gt; but rather to prevent against
local attack. The idea here is largely that you might have some
kind of shared computer and you don&#39;t want whoever you share it with
to know what sites you are going to.
The official motivating use case for private
browsing is often phrased as buying presents for someone,
with the unofficial use case being pornography.&lt;/p&gt;
&lt;p&gt;At a high level, private browsing modes work by not storing browsing state
past the lifetime of the browsing session (though the definition of session
varies somewhat). Here&#39;s Firefox&#39;s list of what it doesn&#39;t store:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Visited pages (history)&lt;/li&gt;
&lt;li&gt;Form and search bar entries&lt;/li&gt;
&lt;li&gt;Download list entries&lt;/li&gt;
&lt;li&gt;Cookies&lt;/li&gt;
&lt;li&gt;Cached Web content and Offline web content and user data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If this is all working correctly, then someone who uses your computer
after you have closed the browser should not be able to learn what
sites you have gone to.&lt;/p&gt;
&lt;p&gt;Because cookies and cached content are deleted, private browsing also
inherently provides some protection against tracking by Web sites in
both the first and third party contexts. This protection operates
at the level of preventing linkage &lt;em&gt;between&lt;/em&gt; sessions.
In particular, it should
prevent the use of these mechanisms for tracking between private and
non-private contexts, such as when you visit a site in private
browsing and then go back to it in regular browsing. It also
prevents sites from using these mechanisms to track you between
multiple private browsing sessions. If our example browsing
activity above had used private browsing along with FPI, then what trackers
would see is shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/private-browsing-pbm.png&quot; alt=&quot;Browser history with private browsing&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The thing to notice here is that the browsing activity before
and after the browser restart are disconnected, so the trackers
(in theory) can&#39;t link them up.&lt;/p&gt;
&lt;p&gt;Of course this also means that you don&#39;t stay logged to sites
that you logged into in private browsing, which is obviously
a pain. And if you &lt;em&gt;do&lt;/em&gt; log in, then of course the site is
able to link up your behavior before and after, obviating
the value of using private browsing for those sites.
This makes
private browsing mode of limited usefulness for
a lot of browsing activities (e.g., shopping).&lt;/p&gt;
&lt;p&gt;In addition to clearing state, browsers have started to add more
explicit anti-tracking mechanisms to private browsing mode.  For
instance, Firefox Private Browsing mode automatically enables
&lt;a href=&quot;https://www.mozilla.org/en-US/firefox/features/adblocker/&quot;&gt;Enhanced Tracking Protection Strict
Mode&lt;/a&gt; (not
the world&#39;s least confusing name), which stops the browser from even
&lt;em&gt;connecting&lt;/em&gt; to many known third party trackers, thus preventing them
from tracking you by IP address or via fingerprinting (see below). The theory here
is that users who have selected private browsing have shown they care
more about privacy than breakage compared to the usual person so the
browser can take a more aggressive posture in terms of enabling
privacy features.  Thus, private browsing modes may provide some
additional protection against cross-site tracking within a session as
well as well as between sessions. This is something that varies
a lot between browsers.&lt;/p&gt;
&lt;h2 id=&quot;beyond-private-browsing-mode&quot;&gt;Beyond Private Browsing Mode &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#beyond-private-browsing-mode&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For the reasons described above, private browsing only provides
partial protection against tracking, either by first parties or
across sites. In order to get that, you need to do something
about IP-based tracking and probably about fingerprinting.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;how-stable-are-ip-addresses%3F&quot;&gt;How Stable Are IP Addresses? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#how-stable-are-ip-addresses%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Most devices use IP addresses that are assigned by their local
network, for instance using &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Dynamic_Host_Configuration_Protocol&amp;amp;oldid=1095103928&quot;&gt;DHCP&lt;/a&gt;. In principle, the
network can change these addresses frequently, but
as a practical matter they appear to change &lt;a href=&quot;https://eltoro.com/how-long-does-an-ip-address-stay-attached-to-a-home-or-business/&quot;&gt;infrequently&lt;/a&gt;. Note that this does not
mean that IP addresses are uniquely identifying: it&#39;s
common for multiple devices to share the same home
IP address via &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_address_translation&amp;amp;oldid=1095000219&quot;&gt;NAT&lt;/a&gt;, in which case sites may or may not be able to distinguish
multiple devices behind the NAT. Specifically, if the devices
are of different types, then the site probably can, but
if you have two identical iPhones, they might not be able to.&lt;/p&gt;
&lt;p&gt;The situation with mobile devices is generally a bit better
because, well, they move around. The way that Internet
routing works is that the address helps determine
where to send the packets, so if you move around physically—e.g.,
to really different cell towers—your address should
change too in order to allow the data to be delivered
correctly.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Though of course, if you are using a mobile
device from your home WiFi, that address is of course likely
to be fairly stable.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;preventing-ip-based-tracking&quot;&gt;Preventing IP-Based Tracking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#preventing-ip-based-tracking&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Addressing IP based tracking requires routing your traffic
through some service that will conceal your IP address. At
present, there are three main alternatives:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Virtual_private_network&amp;amp;oldid=1089497218&quot;&gt;VPN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Apple&#39;s &lt;a href=&quot;https://support.apple.com/en-us/HT212614&quot;&gt;iCloud Private Relay&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://torproject.org/&quot;&gt;Tor&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Technically these are all somewhat different but at a high level
they all work by hiding your IP address behind that of the
service, so that the site can&#39;t track you over long periods of
time (depending on how often your IP address changes). Because
your traffic is encrypted to the proxy, these mechanisms
also provide some privacy against network attackers, though
that protection is somewhat limited. For instance an attacker
who controls the network on both sides of the proxy might be
able to link up your traffic on either side via timing
and packet sizes.&lt;/p&gt;
&lt;p&gt;From the perspective of Web tracking, these systems are all
mostly equally good. The main difference between the designs
comes down to how worried you are about other kinds of
tracking. For instance, in a typical VPN design, you connect
to the VPN service and it forwards your packets to the
server. This means that the VPN sees both your address—and
presumably has your account information anyway—and
the site you are going to, so it is able to track you
even if the site doesn&#39;t; you&#39;re just trusting them not
to.&lt;/p&gt;
&lt;p&gt;iCloud Private Relay addresses this by having two proxies
as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/private-relay-two-hop.png&quot; alt=&quot;Private Relay Architecture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: &lt;a href=&quot;https://www.apple.com/privacy/docs/iCloud_Private_Relay_Overview_Dec2021.PDF&quot;&gt;Apple white paper&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;Those proxies are operated by different providers and so neither has
both your identity and the site you are going to and would therefore
have to collude in order to learn your browsing history.  You could
potentially have accomplished&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
the same thing by getting two VPN accounts
with different providers, but that&#39;s not the usual configuration and
would require you to do the work yourself. With Private Relay you just
engage with Apple and they take care of the arrangements with the
providers (using some somewhat fancy crypto to authorize you to the
provider without revealing your identity). Tor takes this one step
further by having three hops chosen out of a set of community operated
servers. In both cases, the idea is that your behavior is private as
long as one of the server is honest—or hasn&#39;t been subverted.&lt;/p&gt;
&lt;p&gt;The basic problem with all of these designs is that they
require some server (or servers) which relay the traffic
and someone has to pay for those servers and their associated
bandwidth. iCloud Private Relay and most VPNs are not free,
so the user is the one who pays. Tor is different: instead
of having a single provider such as Apple or your VPN
provider, Tor servers are operated by the Tor community on a volunteer
basis and are free to users (this is one of the reasons
why Tor performance is generally not great).&lt;/p&gt;
&lt;h3 id=&quot;preventing-fingerprinting&quot;&gt;Preventing Fingerprinting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#preventing-fingerprinting&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I said above, a browser&#39;s fingerprint depends on a combination
of the client software and the hardware it&#39;s running on: if you
run the same browser on the same hardware, you&#39;ll have a fairly
stable fingerprinting result. If you run a different browser
on the same hardware, you&#39;ll have a somewhat different fingerprinting
result. This means that if you use one browser for your usual
browsing and another browser on the same machine for your &amp;quot;embarrassing&amp;quot;
browsing, then each set of activity will have a consistent fingerprint
and may be somewhat linkable; it may also be possible to partially link up the
two sets of activity based on the fingerprint; I would not generally
assume that if you use (say) Chrome for your regular browsing
and Firefox for your private browsing, you are entirely safe from
fingerprinting. It&#39;s probably worse if you use the same browser
engine&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
type (e.g., Chrome and Edge&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;)
or the same browser but in regular vs.
private browsing mode, in part because they will expose the same
hardware affordances and so have similar fingerprints in that respect.&lt;/p&gt;
&lt;p&gt;A number of browsers have explicit anti-fingerprinting mechanisms
with varying degrees of effectiveness. These include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Blocking connections to origins which perform fingerprinting (&lt;a href=&quot;https://blog.mozilla.org/security/2020/01/07/firefox-72-fingerprinting/&quot;&gt;Firefox&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Adding noise to API return values to make fingerprinting harder (&lt;a href=&quot;https://brave.com/privacy-updates/4-fingerprinting-defenses-2.0/&quot;&gt;Brave&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Removing APIs which can be used for fingerprinting and trying to make other APIs return consistent results across devices (&lt;a href=&quot;https://blog.torproject.org/browser-fingerprinting-introduction-and-challenges-ahead/&quot;&gt;TorBrowser&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Chrome has also proposed something called the &lt;a href=&quot;https://github.com/mikewest/privacy-budget&quot;&gt;Privacy Budget&lt;/a&gt;
in which sites would be allowed to access some data but then
to throttle access after they had obtained a certain amount
(see &lt;a href=&quot;https://mozilla.github.io/ppa-docs/privacy-budget.pdf&quot;&gt;here&lt;/a&gt;) for
our analysis of this proposal. I don&#39;t believe it&#39;s been
implemented.&lt;/p&gt;
&lt;p&gt;This is an area of research that I&#39;m not super familiar with, but
my sense is that it&#39;s not really that clear how much information
can be obtained from fingerprinting. There have been a number
of papers on this topic but they generally fall into two
categories:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Specific new fingerprinting techniques&lt;/li&gt;
&lt;li&gt;Attempts to measure the amount of fingerprinting information
available via fingerprinting.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Estimates of the total amount of fingerprinting surface vary a fair
bit but generally hover around 18-20 bits of information.
Naively, this would be enough to reduce the size of the crowd
you are hiding in by a factor about a million, which is obviously
bad, but not enough to identify you specifically in many cases.
This is kind of misleading because some people&#39;s configurations
are more unusual than others. For instance work by
&lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/3178876.3186097&quot;&gt;Gómez-Boix, Laperdrix, and Baudry&lt;/a&gt;
found that out of a data set of around 2 million users 29% of mobile users are unique, whereas 56% of personal
computers are.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
On the other hand, if you have a very popular device that
is configured in a common way—e.g., an out of the box
iPhone—then this might leak a lot less than 18-20 bits.
I&#39;m not aware of much academic research on this question
or on the effectiveness of anti-fingerprinting mechanisms
(please let me know if you have any!). Presumably it&#39;s better
than nothing, but I don&#39;t know by how much.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/private-browsing/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The bottom line here is that there are a lot of tracking mechanisms
on the Web, and I&#39;ve just covered the main ones. It&#39;s possible to do quite a bit to mitigate
tracking, but the more you do, the bigger impact it has on your browsing
experience, both in terms of functionality and performance.
Everyone has to sort of choose their own level of comfort
here, but if you don&#39;t at least do something to protect yourself
from IP-based tracking, then the level of privacy is going
to be limited, especially for a single site.
Finally, if you want to actually browse privately,
then you actually have to be anonymous, which means not
logging into stuff, not buying things, etc. You can still
watch cat videos, though.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is the best link I could find, but a better one would be appreciated. &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You might say that people shouldn&#39;t architect their
systems that way, but this kind of thing happens
and if the browser breaks them, then the browser
gets blamed. &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Don&#39;t even get me started on &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mobile_IP&amp;amp;oldid=1086531780&quot;&gt;mobile IP&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I say &amp;quot;potentially&amp;quot; because those two providers might have
their equipment in the same data center or cloud provider,
in which case you need to worry about that provider. &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For those who don&#39;t know, a lot of browsers are built on the
&lt;a href=&quot;https://www.chromium.org/Home/&quot;&gt;Chromium&lt;/a&gt; open source code
base that Chrome is based on, which means that they are internally very similar. In addition, every browser on iOS is based on the same engine
because Apple forbids other engines on iOS &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Brave is a potential exception
here because of their anti-fingerprinting features). &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is also a bit misleading because in a larger
data set, these might not be unique. &lt;a href=&quot;https://educatedguesswork.org/posts/private-browsing/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding The Web Security Model, Part VI: Browser Architecture</title>
		<link href="https://educatedguesswork.org/posts/web-security-model-browser-architecture/"/>
		<updated>2022-06-27T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web-security-model-browser-architecture/</id>
		<content type="html">&lt;p&gt;This is part VI of my series on the Web security model (parts
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising&quot;&gt;outtake&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin&quot;&gt;III&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors&quot;&gt;IV&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels&quot;&gt;V&lt;/a&gt;).
I&#39;d been planning to talk about microarchitectural
attacks next, but it&#39;s pretty hard to understand without
some background on overall browser architecture, so I&#39;ll be covering
that first.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-operating-system-processes&quot;&gt;Background: Operating System Processes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#background%3A-operating-system-processes&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We actually have to start even earlier, with the structure of programs
in a computer. In early computers, you would just have one program
running at a time and that program had sole control of the processor.&lt;/p&gt;
&lt;p&gt;Modern computers can of course run multiple programs at once, but
they do that by having them share the processor. The operating
system is responsible for managing this. Each program runs in
what&#39;s called a &lt;em&gt;process&lt;/em&gt;. The operating system lets
process run for a little while (what&#39;s called a &lt;em&gt;time slice&lt;/em&gt;), then stops it
and hands control to the next process, which gets to run
for its own time slice before control is handed to the next process, etc.
This is called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Computer_multitasking&amp;amp;oldid=1088203348&quot;&gt;multitasking&lt;/a&gt;
and allows multiple programs to share the
same computer.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
In modern computers, time slices are very short and the processor switches
between programs very quickly so it gives the illusion that everything
is running in parallel.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;In a modern OS, programs don&#39;t need to do anything special to make this happen; they
just act as if they have full control of the processor and the
operation system takes care of switching between them.  In particular,
each process has its own view of the computer&#39;s memory and so process
A can&#39;t just address process B&#39;s memory, either by accident or
intentionally. This isn&#39;t to say that they
can&#39;t interact at all, but the operating system is responsible for
mediating that interaction, allowing some things and forbidding
others.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible for a single program to run multiple processes.
One reason to do this is to let two operations run in parallel.
Consider a networking process like a Web server. The basic
code for something like this might look this might look
something like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;loop {
   request = read_request();
   response = create_response(request);
   write_response(response);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So what happens if a Web server wants to serve two clients at
once? This is fine if the requests come in quickly, but
what happens if the request from client A trickles in over
a few seconds and then client B sends its request?
The server can&#39;t process it until its finishing handling
client A. If instead the server runs in two processes, however, then
process 1 can handle client A and process 2 is available
to handle client B when its request comes in. The operating
system takes care of making sure that each process gets
time to run, so this works fine without any extra effort
by the server, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/multiplexing-server.png&quot; alt=&quot;Multiplexing in a server&quot; /&gt;&lt;/p&gt;
&lt;p&gt;You can also get multitasking inside a single
process, using a mechanism called &lt;em&gt;threads&lt;/em&gt;. Threads
inside a process get scheduled independently, so that
you can write the same kind of linear code as above
and have it run in parallel, but they
aren&#39;t isolated like processes are. This means that,
for instance, thread 1 can accidentally corrupt thread
2&#39;s memory, or, if thread 1 crashes it can crash
the whole program. On the other hand, switching between
threads tends to be cheaper than switching between
processes, so each mechanism has its place. Finally,
a process with multiple threads tends to consume less
memory than the same number of processes because the
threads can share a lot of runtime state.&lt;/p&gt;
&lt;h2 id=&quot;single-process-browsers&quot;&gt;Single-Process Browsers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#single-process-browsers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Originally, browsers just had everything in a single process.
This included not only the user interface and networking
code but also all the code that rendered the Web page and
the JavaScript that ran in the page. Moreover, they
often ran almost everything in a single &lt;em&gt;thread&lt;/em&gt;,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
with the program being responsible for multiplexing
keyboard input, network activity, etc. (see the side
bar for more on this).
Because each thread can only do one thing at once, this
tended to produce a lot of situations where the browser
would become temporarily unresponsive (the technical term here
is &lt;em&gt;jank&lt;/em&gt;) because it was doing something else rather
than responding to the user or playing your video, so
gradually more and more of the the browser migrated into
other threads in order to reduce the impact on the user
experience.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;event-based-programming&quot;&gt;Event-Based Programming &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#event-based-programming&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;If you don&#39;t have threads, it&#39;s still possible to multiplex
between different tasks. The basic technique is what&#39;s called
an &lt;em&gt;event loop&lt;/em&gt;. The basic idea behind an event-loop is that
you have a piece of code that allows you to register &lt;em&gt;event handlers&lt;/em&gt;
for when certain things happen (e.g., a packet comes in or
someone types a key). An event handler is just a function that
runs when that event happens.&lt;/p&gt;
&lt;p&gt;So, for instance, you might have something like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;function onKeyPressed() {
   ...
}

function onMouseMovement() {
   ...
}

register(KEY_PRESSED, onKeyPressed);
register(MOUSE_MOVED, onMouseMovement);

run_event_loop();
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;run_event_loop()&lt;/code&gt; function just runs forever, waiting
for something interesting to happen—where &amp;quot;interesting&amp;quot;
is defined as &amp;quot;some event that has a handler registered&amp;quot;
and when it does it runs the associated handler function. When
the handler function completes, the event loop resumes
waiting until something else happens.&lt;/p&gt;
&lt;p&gt;This works fine and is still common—for instance,
the popular &lt;a href=&quot;https://nodejs.org/en/&quot;&gt;Node.js&lt;/a&gt;
JavaScript runtime works this way—but it&#39;s a lot of
work to program in. First, because nothing happens while
the event handler is running, you constantly have to worry about whether you accidentally
are taking up too much time with some operation. For
instance, if someone presses a key and then clicks a button
and your key press handler takes 500ms, then the button click
doesn&#39;t get processed for 500ms, which is obviously very
unpleasant for users.&lt;/p&gt;
&lt;p&gt;This means that you have to break up anything long-running into
multiple pieces, but every time you switch from one logical operation to another, you have to
arrange to save your state so it&#39;s there when you come back
to it, which is annoying. By contrast, if you are writing
multi-process or multi-threaded code, then the scheduler takes
care of pausing one logical operation and letting another run,
so you don&#39;t need to worry about saving your state and coming
back to it. In fact, it&#39;s so annoying to program this way that
some event-driven systems (in particular JavaScript in both
Web browsers and Node.js) have developed mechanisms like
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function&quot;&gt;async/await&lt;/a&gt;
that let the programmer write code that appears to be linear but is
secretly event-driven.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As an example, until 2016 Firefox had an architecture with a single process
containing a number of threads for tasks that could be run
asynchronously like networking and media.  For instance, the user
interface runs on one thread, but what happens if the user asks to do
something that takes a long time, like load a Web page? The way this
happens is that the UI thread &lt;em&gt;dispatches&lt;/em&gt; a request to a different
thread which is responsible for networking. The networking thread can
then connect to the Web site and download the content in the
background. This allows the UI to continue to be responsive to the
user while the Web page downloads.&lt;/p&gt;
&lt;p&gt;This architecture is straightforward and has a number of advantages.
In particular, it is easier to share state between the different
threads. For example, consider the case I just gave above in which
the UI thread needs to send a request to the network thread,
it would assemble a request structure and pass it to the network
thread, which could look something like this (this is not
real Firefox code):&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string url&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;string referer&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; NetworkRequest&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;// &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;NetworkRequest &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;msg &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;NetworkRequest&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;msg&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;method &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; HTTP_GET&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;msg&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;url &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;https://example.com/&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;msg&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;referer &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; std&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;https:/referer.example/&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;networkThread&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Dispatch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;msg&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When this code calls &lt;code&gt;networkRequest-&amp;gt;Dispatch()&lt;/code&gt; it passes a pointer
to (i.e., the memory address of) the &lt;code&gt;NetworkRequest&lt;/code&gt; object to the
networking thread, which then can access the contents of that object.
In C++, the &lt;code&gt;NetworkRequest&lt;/code&gt; object does not consist of a contiguous
block of memory. Instead, the &lt;code&gt;url&lt;/code&gt; and &lt;code&gt;referer&lt;/code&gt; members are likely
to be separate blocks of memory, with the &lt;code&gt;NetworkRequest&lt;/code&gt; object just
holding pointers to those objects.  This all works because threads
share memory, which means that a memory address that is valid on the
main thread is also valid on the networking thread.  Therefore, you
can just pass a pointer to the structure itself and everything works
fine.&lt;/p&gt;
&lt;p&gt;By contrast, if there were a separate networking process, then this
wouldn&#39;t work because the pointer to the structure wouldn&#39;t point to a
valid memory region in the networking process. Instead you have to
&lt;em&gt;serialize&lt;/em&gt; the structure by turning it into a single message, e.g.,
by concatenating the method, the URL, and the referer. You then send
that message to the network process which &lt;em&gt;deserializes&lt;/em&gt; it back into
its original components. Any responses from that process would have to
come back the same way.&lt;/p&gt;
&lt;p&gt;This is a huge advantage when you have a single threaded program
that you want to make multithreaded, because memory sharing makes it
comparatively easy to move an operation to another thread. I say
&amp;quot;comparatively&amp;quot; because it&#39;s still not easy. If you have multiple
threads trying to touch the same data at the same time you can
get &lt;a href=&quot;https://web.archive.org/web/20130210020743/https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong&quot;&gt;corruption and other horrible problems&lt;/a&gt;,
so you have to go to a lot of work&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
to make sure that doesn&#39;t
happen.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; This kind of problem, called a &lt;em&gt;data race&lt;/em&gt;,
can be incredibly hard to debug, especially as it often
won&#39;t happen in your tests but only in some scenario
where things are operating in a way you didn&#39;t expect;
but even uncommon things happen a lot when you have
a piece of software used by millions of people.&lt;/p&gt;
&lt;p&gt;With processes, by contrast, you mostly get this kind
of protection for free, because memory isn&#39;t usually shared,
but you have to pay the cost upfront of restructuring
the code so it doesn&#39;t depend on shared memory.
This tends to make threads look more attractive than they
actually would be if you counted the total cost including
diagnosing issues once the software is deployed. In
any case, so it&#39;s quite common to see big programs with
a lot of threads.&lt;/p&gt;
&lt;h3 id=&quot;stability-and-security-issues&quot;&gt;Stability and Security Issues &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#stability-and-security-issues&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because all the threads in the same process share the same
execution environment, defects that occur in one thread
have a tendency to impact the whole program. For example,
consider what happens if part of your program tries to access
an invalid region of memory. On UNIX systems this generally
results on what&#39;s called a &lt;em&gt;segmentation fault&lt;/em&gt;, which causes
the process to terminate. If your entire program is in
a single process, then the user just sees your entire program
crash. Web browsers are very complicated systems that therefore
have a lot of bugs, and it used to be very common for people
to just have the whole browser crash.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Another example is that it&#39;s possible for one Web site
to &lt;em&gt;starve&lt;/em&gt; another Web site. Because the JavaScript engine
runs on a single thread, if site A writes some JavaScript
that runs for a long time, then site B&#39;s JavaScript doesn&#39;t
get to run. On Firefox, this issue was even worse because
the browser UI also ran on the same thread, so it was
possible for a Web site to prevent the browser UI from
working well. Firefox had some code to detect these cases
and alert the user, but it could still cause
detectable UI jank.&lt;/p&gt;
&lt;p&gt;A single process can also lead to security issues: if an attacker manages to
&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/&quot;&gt;compromise the code&lt;/a&gt; running in part of the
program, then they can use it to access any memory in the process.
For instance, in a Web browser they might steal your cookies
and use them to impersonate you to Web sites.
In a Web server, they might
steal the cryptographic keys that authenticate the server
and use that to impersonate the server to other clients.
In addition, because any code they manage to execute has the
privileges of the whole program, they can do anything the program
can do, such as read or write files on your disk, access your
camera or microphone, etc.&lt;/p&gt;
&lt;h2 id=&quot;process-separation&quot;&gt;Process Separation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#process-separation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As discussed in a &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety&quot;&gt;previous post&lt;/a&gt;, there
is a standard approach to dealing with this issue:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Take the most dangerous/vulnerable code and run it in its own
process (process separation).&lt;/li&gt;
&lt;li&gt;Lock down that process so that it has the minimum /privileges
needed to do its job (sandboxing). The details of this vary
from operating system to operating system but the general
idea is that a process can give up its privileges to do
things like access the filesystem or the network.&lt;/li&gt;
&lt;li&gt;If the process needs extra privileges have it talk to another
process which has more privileges but is (theoretically)
less vulnerable.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This strategy was introduced in
&lt;a href=&quot;http://www.peter.honeyman.org/u/provos/papers/privsep.pdf&quot;&gt;SSHD&lt;/a&gt; and
then &lt;a href=&quot;https://seclab.stanford.edu/websec/chromium/chromium-security-architecture.pdf&quot;&gt;first shipped in a mainstream browser by Chrome/Chromium&lt;/a&gt;.
The way that Chromium originally worked was that the HTML/JS renderer
ran in a sandbox, but the UI and the network access ran in the
&amp;quot;parent&amp;quot; process (what Chromium called the &amp;quot;browser kernel&amp;quot;).
The following figure from Barth et al.&#39;s original paper on Chromium
shows how this works:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/chromium-architecture.png&quot; alt=&quot;Chromium architecture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this figure &amp;quot;IPC&amp;quot; refers to &amp;quot;interprocess communication&amp;quot;
which just means a bidirectional channel that the two processes
can use to talk to each other. As noted above, that requires
serializing the messages for transmission over the wire and
decoding them on receipt.&lt;/p&gt;
&lt;p&gt;As you would expect, this architecture has a number of stability and
security advantages.&lt;/p&gt;
&lt;h3 id=&quot;stability&quot;&gt;Stability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#stability&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;On the stability side, if the renderer process
crashes, the parent process can detect this and restart it. This isn&#39;t
an entirely glitch-free experience because the site the user is viewing
still crashes, but because Chrome can run multiple processes, it doesn&#39;t necessarily impact every
browser tab. Similarly, because each tab is running in its own
process, if tab A has some kind of long-running script it
doesn&#39;t necessarily impact tab B, and won&#39;t impact the main browser
UI.&lt;/p&gt;
&lt;h3 id=&quot;security&quot;&gt;Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#security&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because the renderer is sandboxed, compromise
of the renderer is less serious. For instance, the renderer
would not be able to read files off the filesystem directly
but would have to ask the parent to do it. Of course, if
the renderer can ask the parent to read &lt;em&gt;any&lt;/em&gt; file, then this
isn&#39;t much of an improvement, so instead the renderer asks
the parent to bring up a file picker dialog and then only the
selected file will be accessible. This is a specific case
of a general pattern, which is that the parent only partly
trusts the renderer and has to perform access control
checks when the renderer asks for something.&lt;/p&gt;
&lt;p&gt;In order to gain full control of the computer, an attacker
who compromises the renderer must first escape the sandbox.
This tends to happen in one of two ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The attacker uses a vulnerability in the operating system
to elevate its privileges beyond those it is supposed
to have.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The attacker uses a vulnerability in the parent process
to subvert that process or to cause it to do something it
shouldn&#39;t.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Sandbox escapes do happen with some regularity but you&#39;ve
now raised the bar on the attacker by requiring them to
have two vulnerabilities rather than one.&lt;/p&gt;
&lt;p&gt;Of course this does not provide perfect security. First, much of the
browser runs outside the sandbox, so compromise of these portions
can lead directly to compromise of your machine. A good example
of this is networking code, which is exposed directly to the
attacker and is easy to get wrong.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Second, sites are not protected from each other because the
same process may serve multiple sites, either consecutively—for
instance if the user navigates between sites—or simultaneously—for
instance, if the browser uses the same process for multiple tabs
or because a site loads a resource from another site.
If a site is able to successfully attack the renderer,
it can then access state associated with another site, including
cookie state and the like. Thus, the browser protects the
user&#39;s computer, but not any Web-associated data. As more and
more of the work people moved to the Web, this became a more serious
threat; if an attacker can&#39;t take over your computer but they
can read all your banking data and your mail, this represents
a serious threat.&lt;/p&gt;
&lt;h2 id=&quot;site-isolation&quot;&gt;Site Isolation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#site-isolation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The natural way to address the problem of sites attacking each other
via browser vulnerabilities is to isolate each
site&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
in its own process.
This is called &lt;em&gt;site isolation&lt;/em&gt;, and unfortunately it
turns out to be &lt;em&gt;a lot&lt;/em&gt; harder than it sounds, for a number
of reasons.&lt;/p&gt;
&lt;p&gt;First, there are a number of Web APIs that allow for &lt;em&gt;synchronous&lt;/em&gt;
access between windows or IFRAMEs. For instance, if site A does
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Window/open&quot;&gt;&lt;code&gt;window.open()&lt;/code&gt;&lt;/a&gt;
then it gets a handle it can use to access the new window, for
instance to navigate it to a different site or—if it&#39;s the same
site—to access its data. Similarly, the opened window gets a
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Window/opener&quot;&gt;&lt;code&gt;window.opener&lt;/code&gt;&lt;/a&gt;
property that it can use to access the window that opened it. The
APIs that use these values are expected to behave synchronously,
so for instance, if you want to look at some property of
&lt;code&gt;window.opener&lt;/code&gt; this has to happen immediately.
If each site is in its own process, then that becomes
tricky, so you have to implement
some way of allowing that. There are a fair number of similar
scenarios and converting a browser to site isolation requires
finding and fixing each of them.&lt;/p&gt;
&lt;p&gt;Second, unlike the simpler site isolation design, you need
to ensure that each process is constrained to only do the things that
are allowed for that site. For instance, the process for site &lt;strong&gt;A&lt;/strong&gt;
cannot access the cookies for site &lt;strong&gt;B&lt;/strong&gt;. This means that every
single request to access data that isn&#39;t local to the processes&#39;s memory
not only needs to go through the parent—as in process separation—but
the parent needs to check that the process that is making it is entitled
to do so, first by keeping track of which process goes with which
site and second by doing the right permissions checks. Previously,
these permissions checks could be in the renderer process, which was
a lot easier, especially if, as in Firefox, they had started there
in the first place.&lt;/p&gt;
&lt;p&gt;Finally, because having a lot of processes consumes a lot more memory,
a lot of work was required to try to shrink the overall memory
consumption of the system. This also means that is harder
to deploy full site isolation on mobile devices which tend to
have less memory.&lt;/p&gt;
&lt;p&gt;At present, Chrome—and other Chromium derived browsers such as
Edge and Brave—and Firefox have full site isolation, but to the
best of my knowledge, Safari does not yet have it.&lt;/p&gt;
&lt;h2 id=&quot;inside-baseball%3A-multiprocess-firefox&quot;&gt;Inside Baseball: Multiprocess Firefox &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#inside-baseball%3A-multiprocess-firefox&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Unlike Chrome, which was designed from the beginning as a multiprocess
browser, Firefox originally had a more traditional &amp;quot;monolithic&amp;quot; architecture.
This made converting to a multiprocess architecture much more painful
because it meant unwinding all the assumptions about how things would
be mutually accessible. In particular, Firefox had a very extensive
&amp;quot;add-on&amp;quot; ecosystem that let add-ons make all sorts of changes to
how Firefox operated. In many cases, these add-ons depended on having
access to many different parts of the browser and so weren&#39;t easily
compatible with a multi-process system.&lt;/p&gt;
&lt;p&gt;At the same time as Chrome was building a multiprocess architecture,
Mozilla was developing a new programming language, &lt;a href=&quot;https://www.rust-lang.org/&quot;&gt;Rust&lt;/a&gt;,
which was specifically designed for the kinds of systems programming
that is required to make a browser engine. Rust had two
key features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Memory safety so that it was much harder to write memory unsafe
code, thus eliminating a &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/&quot;&gt;broad class&lt;/a&gt;
of serious vulnerabilities.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Thread safety so that it was much easier to write multithreaded
code without creating data races that lead to vulnerabilities
and unpredictable behavior.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Instead of converting Firefox to a multiprocess architecture,
Mozilla focused on the idea of rewriting much of the browser
engine in Rust (a project called &lt;a href=&quot;https://servo.org/&quot;&gt;Servo&lt;/a&gt;).
If successful, this would have addressed many of the same issues as
a multiprocess system: you could easily write multithreaded
code and because it was memory safe you wouldn&#39;t need to worry
as much about compromises of one thread leading to compromises of
the process as a whole. If this had worked it would have been
very convenient because it would have allowed for a gradual
transition without breaking add-ons (which was considered
a big deal). It would also have used less memory and quite
likely been faster.&lt;/p&gt;
&lt;p&gt;The Big Rewrite ultimately didn&#39;t work out, for two major reasons. First, it just
wasn&#39;t practical to rewrite enough of the browser in Rust to make a
real difference. Firefox is over &lt;a href=&quot;https://hacks.mozilla.org/2020/04/code-quality-tools-at-mozilla/&quot;&gt;20 million lines of
code&lt;/a&gt;,
a huge fraction of it in C++, reflecting over 20 years of software
engineering by a team of hundreds. Even if writing in Rust
was dramatically faster it would still be very expensive to
replace all that code. Firefox eventually did incorporate
several big chunks of new tech from Servo, such as the
&lt;a href=&quot;https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-engine-quantum-css-aka-stylo/&quot;&gt;Stylo&lt;/a&gt;
Style engine and the &lt;a href=&quot;https://hacks.mozilla.org/2017/10/the-whole-web-at-maximum-fps-how-webrender-gets-rid-of-jank/&quot;&gt;WebRender&lt;/a&gt; rendering system, and a lot of new Firefox
code is written in Rust, but it just wasn&#39;t practical to
replace everything.&lt;/p&gt;
&lt;p&gt;The second reason comes down to &lt;strong&gt;JavaScript&lt;/strong&gt;. A &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1FslzTx4b7sKZK4BR-DpO45JZNB1QZF9wuijK3OxBwr0/edit#gid=0&quot;&gt;huge
fraction&lt;/a&gt;
of the memory vulnerabilities in browser engines actually isn&#39;t due to
the memory unsafety of the browser but rather to logic errors in the
JavaScript VM that lead to the code it generates being unsafe.
Writing in Rust doesn&#39;t inherently fix these problems—though
of course a rewrite might lead to simpler or easier to verify
code.&lt;/p&gt;
&lt;p&gt;In any case, Mozilla eventually decided to introduce process
separation, in a project called
&lt;a href=&quot;https://wiki.mozilla.org/Electrolysis&quot;&gt;Electrolysis&lt;/a&gt;. At first
Firefox only had one content process and even later after
it added multiple processes, it
was far more conservative
than Chrome about the number of processes that it started,
in an attempt to conserve memory.
(see &lt;a href=&quot;https://medium.com/mozilla-tech/the-search-for-the-goldilocks-browser-and-why-firefox-may-be-just-right-for-you-1f520506aa35&quot;&gt;here&lt;/a&gt; for some spin on why having 4 processes was perfect
rather than just easy). And those add-ons? Eventually Firefox
&lt;a href=&quot;https://blog.mozilla.org/addons/2015/08/21/the-future-of-developing-firefox-add-ons/&quot;&gt;deprecated&lt;/a&gt;
them, in favor of &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions&quot;&gt;WebExtensions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In retrospect, the decision to do Electrolysis was fortunate because, as we&#39;ll discuss
next time, multithreaded architectures simply can&#39;t properly
defend against Spectre-type attacks, so Firefox would have
had to move to multiprocess in any case and having already
done Electrolysis at least got it part of the way there.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-microarchitectural-attacks&quot;&gt;Next Up: Microarchitectural Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#next-up%3A-microarchitectural-attacks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Because site isolation was so much work, converting browsers from
process separation took a really long time. Chrome was the first
browser to start working on site isolation back in 2015 but they
were still far from finished in 2018 when an entirely new class of attacks
that exploited microarchitectural features of modern processors
was discovered. The only known viable
long-term defense against these attacks is to move to full site isolation,
leading Chrome to increase their level of urgency and Firefox to
launch &lt;a href=&quot;https://wiki.mozilla.org/Project_Fission&quot;&gt;Project Fission&lt;/a&gt;
to add site isolation to Firefox. I&#39;ll be covering these attacks
in the next post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technically, what I&#39;m describing here is &amp;quot;preemptive multitasking&amp;quot;,
because the operating system switches programs out without
their cooperation. The alternative is &amp;quot;cooperative multitasking&amp;quot;,
in which programs give up control of the processor. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Newer computers also have multiple processors and/or multiple cores
and can really do some stuff simultaneously, but that&#39;s not
that relevant here. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As far as I can tell &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mosaic_(web_browser)&amp;amp;oldid=1093277968&quot;&gt;Mosaic&lt;/a&gt; actually was completely
single-threaded. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Much of the machinery of languages like Rust and Erlang
is designed to make it possible to safely write multithreaded
code without a lot of mental overhead. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The Mozilla San Francisco offices used to have a sign set about
8 feet off the floor that read
&lt;a href=&quot;https://bholley.net/blog/2015/must-be-this-tall-to-write-multi-threaded-code.html&quot;&gt;&amp;quot;Must be this tall to write multithreaded code&amp;quot;&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technically it&#39;s possible to recover from memory violations
in the sense that you can just tell the program to ignore
the error and keep executing—the Emacs editor used
to allow this—but once you&#39;ve had some kind of memory
issue like this, your program is in an uncertain state
so all bets are off. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Firefox and Chrome are both moving networking into a separate
process, and I believe Chrome may have recently completed
this on some systems. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Perhaps surprisingly, the unit of isolation is not the &lt;em&gt;origin&lt;/em&gt;
but rather the &lt;em&gt;site&lt;/em&gt;, which is to say the registrable domain,
aka &amp;quot;eTLD+1&amp;quot;. So, for instance, &lt;code&gt;mail.example.com&lt;/code&gt; and &lt;code&gt;web.example.com&lt;/code&gt;.
The reason for this is that sites can set the &lt;code&gt;document.domain&lt;/code&gt;
property to set their domain to the parent domain, e.g.,
from &lt;code&gt;mail.example.com&lt;/code&gt; to &lt;code&gt;example.com&lt;/code&gt;. This puts them
in the same origin. See the Chromium &lt;a href=&quot;https://www.chromium.org/developers/design-documents/site-isolation/#threat-model&quot;&gt;design document&lt;/a&gt; on site isolation for more detail.
 &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-browser-architecture/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>First impressions of Web5</title>
		<link href="https://educatedguesswork.org/posts/web5-first-impressions/"/>
		<updated>2022-06-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web5-first-impressions/</id>
		<content type="html">&lt;style&gt;
.img-wrap {
  display: inline-block;
}
.img-wrap img {
  width: 100%;
}&lt;/style&gt;
&lt;p&gt;Recently Jack Dorsey &lt;a href=&quot;https://twitter.com/jack/status/1535314738078486533&quot;&gt;announced&lt;/a&gt;
a new project called &lt;a href=&quot;https://developer.tbd.website/projects/web5/&quot;&gt;Web5&lt;/a&gt;
which is billed as &amp;quot;an extra decentralized web platform&amp;quot;. I&#39;ve now had time
to take a look at the &lt;a href=&quot;https://developer.tbd.website/docs/Decentralized%20Web%20Platform%20-%20Public.pdf&quot;&gt;pitch deck&lt;/a&gt; and some of the specifications. This post provides some
initial impressions.&lt;/p&gt;
&lt;h2 id=&quot;overall-idea&quot;&gt;Overall Idea &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#overall-idea&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Although Web5 bills itself as for the &amp;quot;decentralized Web&amp;quot;, it seems
to be addressing a somewhat different set of applications than those
I &lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization&quot;&gt;explored&lt;/a&gt; previously
(helping to make the case that &amp;quot;decentralized Web&amp;quot; is an unhelpful
term).
In that post, we mostly looked at the problem of how one
could publish Web sites and apps without having to use some
kind of centralized service. Web5, however, seems to be trying
to solve the problem of how to use
various Web services (e.g., Spotify or Twitter) while
still maintaining control of your data. To that end, the site lists two main
use cases:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Control Your Identity&lt;/strong&gt;
Alice holds a digital wallet that securely manages her identity, data, and authorizations for external apps and connections. Alice uses her wallet to sign in to a new decentralized social media app. Because Alice has connected to the app with her decentralized identity, she does not need to create a profile, and all the connections, relationships, and posts she creates through the app are stored with her, in her decentralized web node. Now Alice can switch apps whenever she wants, taking her social persona with her.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Own Your Data&lt;/strong&gt;
Bob is a music lover and hates having his personal data locked to a single vendor. It forces him to regurgitate his playlists and songs over and over again across different music apps. Thankfully there&#39;s a way out of this maze of vendor-locked silos: Bob can keep this data in his decentralized web node. This way Bob is able to grant any music app access to his settings and preferences, enabling him to take his personalized music experience wherever he chooses.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The system defines a number of technical components to address
these use cases.&lt;/p&gt;
&lt;h3 id=&quot;decentralized-web-nodes&quot;&gt;Decentralized Web Nodes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#decentralized-web-nodes&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The core idea seems to be that instead of storing your data on the
service, you instead store it in a &lt;a href=&quot;https://developer.tbd.website/projects/dwn-sdk-js/readme/&quot;&gt;Decentralized Web Node (DWN)&lt;/a&gt;, which is a network element that is somehow associated with
you and that you trust with your data. When services want to use
your data—for instance, when Spotify wants to look at your playlist—they
contact your DWN and request it. Because the data is stored
on your DWN, you nominally control it and how it is used. In other
words, this is a &lt;em&gt;federated&lt;/em&gt; system.&lt;/p&gt;
&lt;p&gt;The diagram below shows the main idea:&lt;/p&gt;
&lt;div class=&quot;img-wrap&quot;&gt;
&lt;center&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Web5-overall.png&quot; alt=&quot;Web5 Overall Architecture&quot; /&gt;&lt;/p&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;p&gt;In a conventional Web application, each site has its own
storage, typically some kind of database (see &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/&quot;&gt;here&lt;/a&gt; for
an overview of this kind of Web app). The site stores all
of your data/state and you don&#39;t have any real access to
it. In Web5, each Web site will instead store its data on your DWN.
This gives you access to and control of the data but also in theory
means that it&#39;s portable and/or shareable. For instance, if
you want to change from using Spotify to using Apple Music,
you just give Apple access to the playlist data on your
DWN—and, I suppose, revoke Spotify&#39;s access. It&#39;s also
intended to allow multiple sites concurrent access to the data.
There certainly are use cases where this would be valuable, for instance, sharing your
travel reservations between Kayak and TripIt.&lt;/p&gt;
&lt;p&gt;Note that this kind of element isn&#39;t a new idea. For instance Tim Berners-Lee&#39;s
&lt;a href=&quot;https://solidproject.org/&quot;&gt;Solid&lt;/a&gt; project has a very similar concept
called &amp;quot;Pods&amp;quot;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Solid is a specification that lets people store their data securely
in decentralized data stores called Pods. Pods are like secure
personal web servers for data. When data is stored in someone&#39;s Pod,
they control which people and applications can access it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Of course the technical details of Web5 and Solid are completely
different (for instance, the APIs are different and Web5 is based
on DIDs whereas Solid uses OIDC for authentication&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;)
but at the big-picture level these ideas seem to be pretty similar.&lt;/p&gt;
&lt;p&gt;More generally, the basic idea of &lt;em&gt;Bring Your Own Storage (BYOS)&lt;/em&gt;
is quite old. Prior to the great Webification of everything—closely
followed by the mobile appification of everything—this is
how applications were generally built: you would have some network
protocol like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internet_Message_Access_Protocol&amp;amp;oldid=1091018645&quot;&gt;IMAP&lt;/a&gt;
(for mail) or &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=CalDAV&amp;amp;oldid=1089512635&quot;&gt;CalDAV&lt;/a&gt; (for calendaring)
that everyone implemented, you would sign up for an account with
a service, and then separately download a client. You could switch
clients at any time because the whole system was interoperable.&lt;/p&gt;
&lt;p&gt;One thing that the Web5 documentation is pretty vague on is where the
DWNs come from. What I mean here is not the code (they have
some open-source implementation you can download) but
the server.
It&#39;s important to recognize that this system depends on trusting the
DWN. Although there is some cryptography the primary security and
privacy protections are provided by the DWN doing access control
and so this isn&#39;t something you can just run on some totally decentralized
system.
I think it&#39;s a safe assumption that most people aren&#39;t going
to run their own physical DWN server—the inconvenience of that sort
of thing is what kicked off our current round of centralization—so
we need some other alternative. I guess the idea is that there will
be some DWN service that you can subscribe to like you do with Dropbox  or
gSuite, but it would be nice if the plan here were clearer. There&#39;s also
some stuff in the spec about how DWNs should be based on IPFS, but I
don&#39;t really understand that at all. As far as I can tell, how the
DWN stores data should be largely invisible.&lt;/p&gt;
&lt;h3 id=&quot;data-model&quot;&gt;Data Model &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#data-model&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A DWN mostly presents a fairly generic data storage interface,
with two main concepts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://identity.foundation/decentralized-web-node/spec/#collections&quot;&gt;&lt;strong&gt;Collections&lt;/strong&gt;&lt;/a&gt; of objects attached to a given JSON &amp;quot;schema&amp;quot;
(i.e., a definition of the elements that need to appear
in a JSON object, such as a playlist).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://identity.foundation/decentralized-web-node/spec/#threads&quot;&gt;&lt;strong&gt;Threads&lt;/strong&gt;&lt;/a&gt;
of messages attached to each other. It&#39;s not entirely clear
to me how these are supposed to work, but the idea seems to be
to provide a generalized peer-to-peer messaging facility
(the slide deck says &amp;quot;send and receive messages over
a DID-encrypted universal network&amp;quot;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;dl&gt;
&lt;dt&gt;There&#39;s also the concept of &lt;a href=&quot;https://identity.foundation/decentralized-web-node/spec/#permissions&quot;&gt;Permissions&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;an entity can request access to a given set of objects (such as a collection)
and the owner of the DWN can grant and revoke access.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;I won&#39;t want to spend too much time on the details here
other than to say that this whole part of the system seems
fairly thin and would probably benefit from engaging more
with prior work.
For example, &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=WebDAV&amp;amp;oldid=1087784528&quot;&gt;WebDAV&lt;/a&gt;
provides a fairly sophisticated data management and access
control model that is quite a bit more advanced than that
presented here, including hierarchical collections,
locking, metadata, and access control lists. This isn&#39;t to
single out WebDAV as ideal but merely to observe that
there&#39;s a lot of prior art in terms of what kind of capabilities
distributed data stores need and my sense is that what&#39;s
presented here is largely insufficient. As a specific example,
real data stores need some way to deal with conflict resolution
and concurrent editing—especially if you have multiple uncoordinated applications writing to the same data, and
&lt;a href=&quot;https://identity.foundation/decentralized-web-node/spec/#last-write-wins&quot;&gt;Last-Write Wins&lt;/a&gt;,
which is the only specified mechanism, is really not enough.&lt;/p&gt;
&lt;p&gt;Similarly, the whole threads concept seems pretty underspecified. If
the idea is to provide some kind of generic secure messaging structure,
there&#39;s a lot more to do here than just encrypt to people&#39;s DIDs—which
I &lt;em&gt;think&lt;/em&gt; is how it is supposed to work. Modern secure messaging
systems like &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-mls-protocol-14.html&quot;&gt;IETF Messaging Layer Security (MLS)&lt;/a&gt;
incorporate a whole bunch of security and interoperability features (e.g.,
&lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-mls-protocol-14.html#name-ratchet-tree-concepts&quot;&gt;ratcheting&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;My point isn&#39;t that these are fatal flaws—all of these are
details which could in principle be fixed—but rather that building a system
like this correctly is very complicated and that there&#39;s a big difference
between what we&#39;ve seen so far and a real system. Moreover, the fact that this
initial specification is so incomplete should not inspire confidence that
it can be turned into something as generic as it seems to aspire to be.&lt;/p&gt;
&lt;h2 id=&quot;distributed-web-apps-(dwas)&quot;&gt;Distributed Web Apps (DWAs) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#distributed-web-apps-(dwas)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The other big idea here is that apps will be written as what the document
calls &lt;em&gt;Distributed Web Apps (DWAs)&lt;/em&gt;. This part is pretty handwavy, but the
basic idea seems to be that they are an extension of what&#39;s called a &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps&quot;&gt;Progressive Web App (PWA)&lt;/a&gt;. PWAs are a sort of confusing
topic, but at a high level, a PWA is a Web app that has been designed to
act more like a native app. This means things like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An icon on the home screen&lt;/li&gt;
&lt;li&gt;Working offline&lt;/li&gt;
&lt;li&gt;Storing data on the client (this is required to work offline)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While PWAs run in the user&#39;s browser, they still ultimately depend on
the main Web site for their data and potentially for some of their
logic. It seems that a DWA will instead directly access the DWN
to get the user&#39;s data, but under the authority of the site.
So, for instance, if you granted &lt;code&gt;example.com&lt;/code&gt; access to your
music playlists, it could either contact the DWN directly or empower
the DWA to do it directly from your browser. The technical details
here are a bit fuzzy, but this also seems pretty clearly doable via
some combination of tokens, delegation, etc.
so I don&#39;t think we should worry too much about that.&lt;/p&gt;
&lt;p&gt;DWAs seem like kind of a separable idea from DWNs. Looking at PWAs,
we see that some sites build native apps and not PWAs, some build
both, and some build neither (my impression is that it&#39;s quite uncommon
to build just a PWA); it&#39;s really a design choice by the site.
Similarly, if you managed to make the shift from site-based storage
to DWNs, I would expect sites to do some combination of native apps,
DWAs, and regular Web sites based on what worked best for them
(there&#39;s no reason why DWNs can&#39;t be used with native apps,
even though that&#39;s not how it&#39;s presented). I don&#39;t think DWAs make or break
the vision of Web5.&lt;/p&gt;
&lt;h2 id=&quot;dids&quot;&gt;DIDs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#dids&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finally, I should mention that all the identities in Web5 are phrased
as &lt;a href=&quot;https://www.w3.org/TR/did-core/&quot;&gt;Decentralized Identifiers (DID)&lt;/a&gt;
(see &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#background%3A-did&quot;&gt;here&lt;/a&gt; for some
background on DIDs). At some level, this is just a detail: you
need some way to talk about principals, there are a lot of potential
options here, and DIDs are entirely generic.&lt;/p&gt;
&lt;p&gt;In order to participate in Web5, the DID document has to contain
&lt;code&gt;DecentralizedWebNode&lt;/code&gt; service endpoint that contains one or
more HTTPS URLs, like so:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;did:example:123&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;service&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;#dwn&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;DecentralizedWebNode&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;serviceEndpoint&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;nodes&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;https://dwn.example.com&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://example.org/dwn&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;[Source: DWN specification]&lt;/p&gt;
&lt;p&gt;Note that because the security of this system depends on the security
of the DWN, and the DWNs, and the DWNs are accessed over HTTPS,
this means that the security of this system depends on
the DNS. This means that the security value you are getting out of
generic DIDs is somewhat limited. The cost of supporting generic
DIDs is the interoperability risk of having a DID method that
isn&#39;t supported by one of the services you want to use. As
a practical matter, if Web5 takes off, I&#39;d expect those services
to mostly converge on a small number of methods.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;how-to-present-new-technical-proposals&quot;&gt;How to present new technical proposals &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#how-to-present-new-technical-proposals&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As an aside, the way Web5 is presented requires a fairly large amount
of filling in the blanks. Basically we have a Web site, a slide deck
with an overview of the system as a whole, and then some detailed
protocol specifications and code on Github. This is all fine,
I guess, but what&#39;s really needed is a document describing the
system architecture, how the technical components fit in, and how
it meets the use cases.
Over the years I have reviewed a lot of early-stage specifications
and the details of those specifications rarely matter, as they
usually get extensively revised during development and standardization.
What&#39;s necessary at this stage is to give readers enough of an
understanding of your overall vision that they can see how it&#39;s
going to work, figure out if it&#39;s worthwhile,
know how you&#39;ve solved the hard problems, and
know what problems remain to be solved. Too many details actually
gets in the way of that, and a slide deck like this is way too high
level. What&#39;s required is a document describing the system architecture.
My put on how to write these is found in
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc4101.html&quot;&gt;RFC 4101&lt;/a&gt;,
but there are obviously lots of ways to do that. But a slide deck isn&#39;t it.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;building-a-full-system&quot;&gt;Building a Full System &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#building-a-full-system&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I said a number of times above that this is pretty thin on details.
That&#39;s not uncommon with early stage proposals, but can make it
very hard to assess the viability of the ideas because you don&#39;t
know what&#39;s hiding behind the vagueness. Things
can be vague for at least three major reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s obvious how to fill them in but someone needs to do so.
For instance, you are pushing around JSON and so you&#39;ll need
some formal definition of the contents. Nobody thinks that&#39;s
impractical, but it&#39;s just work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There are a number of viable ways to do something and it&#39;s
a lot of engineering to work it out,
often because there are conflicting requirements which
have to be balanced, so you&#39;ve put it off.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You actually don&#39;t know how to do it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Reason (1) isn&#39;t a problem at this stage, though it will
eventually be one if you actually want people to build interoperable
systems. Reason (2) is generally a sign that it&#39;s going to
take quite some time to get to production. Reason (3) potentially
represents an existential threat to the project, especially
if you actually have to solve the problem in order for
it to succeed.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
It can often be hard to distinguish cases (2) and (3), and
it&#39;s also very often the case that people think they have
case (2)—or even case (1)—but they actually
have case (3).&lt;/p&gt;
&lt;p&gt;It&#39;s clear that this document has a bunch of case (1), which, as I said, I&#39;m not
too worried about. More worrisome, however, is that it has
a lot of (2) and some stuff that&#39;s either actually in category
(3) or at least requires so much work that it&#39;s practically in
(3), even though we sort of could figure out how to do it.&lt;/p&gt;
&lt;h3 id=&quot;interoperability&quot;&gt;Interoperability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#interoperability&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;My first concern here is interoperability. One of the primary
use cases seems to be that two similar sites will share the same
data on your DWN. The slide deck gives two examples: (1) two music
services sharing your music playlist and (2) sharing your travel
reservations between sites. In order for this to work properly,
the sites that are sharing the same data need to agree on the data
format and semantics.&lt;/p&gt;
&lt;h4 id=&quot;data-model-2&quot;&gt;Data Model &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#data-model-2&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the examples provided in the slide
deck, the data format is identified by a link to a JSON schema
on &lt;a href=&quot;https://schema.org/&quot;&gt;schema.org&lt;/a&gt;, which is a registry
of schemas (definitions of data structures).
For instance, in the music playlist example, playlists would
be rendered as &lt;a href=&quot;https://schema.org/MusicPlaylist&quot;&gt;MusicPlaylist&lt;/a&gt;.
Here&#39;s a slightly trimmed version of the example from
&lt;code&gt;schema.org&lt;/code&gt; (I also fixed their misspelling of &amp;quot;Lynyrd Skynyrd&amp;quot;).&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;@context&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://schema.org&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;@type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MusicPlaylist&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Classic Rock Playlist&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;numTracks&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;track&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;@type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MusicRecording&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;byArtist&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Lynyrd Skynyrd&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;duration&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;PT4M45S&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;inAlbum&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Second Helping&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Sweet Home Alabama&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;url&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;sweet-home-alabama&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;@type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MusicRecording&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;byArtist&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Bob Seger&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;duration&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;PT3M12S&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;inAlbum&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Stranger In Town&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Old Time Rock and Roll&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token property&quot;&gt;&quot;url&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;old-time-rock-and-roll&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is actually not what I expected to see, because the
definition of the &lt;code&gt;byArtist&lt;/code&gt; in &lt;code&gt;track&lt;/code&gt; is actually of
type &lt;a href=&quot;https://schema.org/Person&quot;&gt;&lt;code&gt;Person&lt;/code&gt;&lt;/a&gt;, but &amp;quot;Lynyrd Skynyrd&amp;quot; is
clearly a text field. This appears to be a known problem
in &lt;code&gt;schema.org&lt;/code&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We expect &lt;a href=&quot;http://schema.org/&quot;&gt;schema.org&lt;/a&gt; properties to be used with new types, both from
&lt;a href=&quot;http://schema.org/&quot;&gt;schema.org&lt;/a&gt; and from external extensions. We also expect that often,
where we expect a property value of type Person, Place, Organization
or some other subClassOf Thing, we will get a text string, even if
our schemas don&#39;t formally document that expectation. In the spirit
of &amp;quot;some data is better than none&amp;quot;, search engines will often accept
this markup and do the best we can. Similarly, some types such as
Role and URL can be used with all properties, and we encourage this
kind of experimentation amongst data consumers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This sort of makes sense in a system which seems to be mostly
devoted to publishing metadata that can be consumed if
available and ignored if not, but it&#39;s not sufficient for bidirectional
interoperability.
Obviously, if Spotify expects to use personal names and TIDAL
expects to use &lt;code&gt;Person&lt;/code&gt; we&#39;re going to have problems. It
gets worse, though. There are at least three separate ways to
render the artist who performed &amp;quot;Old Time Rock and Roll&amp;quot;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Bob Seger&lt;/li&gt;
&lt;li&gt;Seger, Bob&lt;/li&gt;
&lt;li&gt;Bob Seger &amp;amp; The Silver Bullet Band (this is what Amazon Music
uses, incidentally).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You could also have &amp;quot;and&amp;quot; instead of &amp;quot;&amp;amp;&amp;quot; in both the name of the
band and the name of the song. This isn&#39;t a problem with
playlists produced and consumed by the same entity because
they can be consistent about their choices—or more
likely have the identifiers refer to actual assets
(e.g., Spotify has resource identifiers that look
like this &lt;code&gt;6rqhFgbbKwnb9MLmUQDhG6&lt;/code&gt;) and just
have human-readable metadata—but it&#39;s critical for
interoperability, where mismatches will result in mysterious
failures.&lt;/p&gt;
&lt;p&gt;The situation with &lt;code&gt;Reservation&lt;/code&gt; is equally bad. To take
one example, it contains &lt;code&gt;departureAirport&lt;/code&gt; (nested
under &lt;code&gt;reservationFor&lt;/code&gt; which is of type &lt;a href=&quot;https://schema.org/Airport&quot;&gt;&lt;code&gt;Airport&lt;/code&gt;&lt;/a&gt;).
Airports can be listed either by IATA code or ICAO code,
so what happens if site A uses the IATA code (&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=YYZ_(song)&amp;amp;oldid=1088997097&quot;&gt;YYZ&lt;/a&gt;) and the other site uses the ICAO code
(CYYZ)? I guess you need to be prepared to accept both. At a higher
level, how do you link up multiple reservations attached to the same
trip? The schema doesn&#39;t tell you, so you have to invent something
(use &lt;a href=&quot;https://schema.org/Trip&quot;&gt;Trip&lt;/a&gt;? Create an identifier that
you attach to each reservation?)
and you can expect different providers to invent different things.
Similarly, if Expedia and United create separate trips, how do you
join them?&lt;/p&gt;
&lt;p&gt;The point isn&#39;t that this kind of schema is bad but that it&#39;s
insufficient in that it mostly defines syntax and not semantics
and there are many structures that are compatible with these
schema (to some extent deliberately because it allows for flexibility!).
If you want to have interoperability, you need to rigorously
define the semantics of everything.
As a good example of how this plays out in practice, look at the &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc4791&quot;&gt;CalDAV&lt;/a&gt;
specification, which contains 99 pages of specification about
how precisely calendaring systems should interoperate,
all assuming that you already have a WebDAV-based data store.
This is the kind of thing you need to do if you actually
want multiple sites to interoperate with the same data
values, and you&#39;ll need to do it one at a time for each
application, not just point at &lt;a href=&quot;https://schema.org/&quot;&gt;schema.org&lt;/a&gt;
and hope. It&#39;s not impossible, it&#39;s just a lot of work, and
it has to be done for every single application domain
where you want to interoperate.&lt;/p&gt;
&lt;p&gt;It&#39;s worth noting that these are actually the easy cases because
they mostly involve multiple sites computing on your data. The
problem of how to have a consistent data model for something
complicated like Twitter or Facebook where people&#39;s viewing
experience is assembled out of other people&#39;s data and you want
to have a consistent experience when viewing a mixture of content
sourced by services A, B, and C—even when you are on
service D—is likely to be a lot harder.&lt;/p&gt;
&lt;h3 id=&quot;application-architecture&quot;&gt;Application Architecture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#application-architecture&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Consider the case of photo sharing, which seems like an
obvious example of owning your own data. So you have all your photos
on your DWN and now you want to give &lt;a href=&quot;https://www.flickr.com/&quot;&gt;Flickr&lt;/a&gt;
access to them so that you can share them with other people. What now?&lt;/p&gt;
&lt;p&gt;The first question we have to answer is where the data will be
served from when people go to look at your albums. One answer
is that it&#39;s served off of your DWN, but this actually puts
enormously high requirements on the DWN in that it has to be
able to serve very high volumes of traffic. Serving that amount
of traffic is one reason you use a photo sharing site like
Flickr in the first place, so that&#39;s no good. This means
that the data has to be served off of Flickr, not your node,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
but how does that work?&lt;/p&gt;
&lt;p&gt;The obvious thing for Flickr to do is to just suck all the
data off of your DWN and replicate it locally. So, instead
of having the architecture I showed above, we actually have
something more like the diagram below:&lt;/p&gt;
&lt;div class=&quot;img-wrap&quot;&gt;
&lt;center&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/Web5-app.png&quot; alt=&quot;A more realistic Web5 architecture&quot; /&gt;&lt;/p&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;p&gt;In this case, Flickr has a copy of your data which is what it
uses to serve to other people, and then—at least in
theory—it periodically syncs your data with the DWN.
This sync has to be bidirectional, so that Flickr can
discover when new pictures have been created, and in practice,
it will actually need some way to be notified when that has
happened. This probably means some kind of publish/subscribe framework
for these notifications. Again, not impossible, but it needs to
be specified.&lt;/p&gt;
&lt;p&gt;Note that even in cases where the site doesn&#39;t need to serve high volumes
of traffic, it&#39;s extremely convenient to have a site-local copy
of the data. For instance, it lets you run algorithms (face
recognition, machine learning, etc.) over
the data quickly without having to constantly retrieve it
from the DWN.&lt;/p&gt;
&lt;p&gt;Another advantage of having a local copy is that it
allows you to make changes that happen
immediately without being dependent on the DWN for performance
(remember that users will blame the site when it&#39;s slow, not the
DWN). But then you have to worry about what happens when the
user makes a big pile of changes on one site that conflict
with changes on some other site and those changes have to somehow
be resolved each site will have to implement all of this logic.
The situation is somewhat better if you just write everything
right to the DWN but you still have to deal with conflict
resolution for any change that&#39;s not instantaneous.&lt;/p&gt;
&lt;p&gt;This is of course a problem for any system that has multiple
readers and writers, and while we do see systems that have
shared data that multiple clients can concurrently write to
(e.g., the &lt;a href=&quot;https://developers.strava.com/&quot;&gt;Strava API&lt;/a&gt;),
application authors have to take real care not to step on each
other. One common pattern you see in practice is for site
A to import data from site B but not to write it back and
just to keep any changes locally. For obvious reasons,
this is a lot easier, especially if you already have to keep
a local copy anyway for other reasons.&lt;/p&gt;
&lt;h3 id=&quot;access-control&quot;&gt;Access Control &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#access-control&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Next, we need to ask how access control will work. As noted
above, the DWN is responsible for denying or granting access to your
data, but the unit of access control is the &lt;em&gt;site&lt;/em&gt;, not the
user. Consider the case of the photo site from the previous section:
once you have shared your photos with the site it is free to show them
to anyone it wants without any involvement from your DWN. Of course,
the site will likely have its own access control settings, but
you&#39;re trusting the site to enforce those, not the DWN.&lt;/p&gt;
&lt;p&gt;Moreover, those access control settings have to be stored
somewhere. If it&#39;s in the site&#39;s database, then you&#39;ve just
lost control of some of your data; if it&#39;s in the DWN then
we have to specify how access control is stored, which is likely
to be very complicated given that each site has its own
access control model (share with friends, share with specific people, etc.)
Of course, the site could just store some site-specific
blob on the DWN, but that&#39;s hardly better than storing it
locally.&lt;/p&gt;
&lt;p&gt;It&#39;s possible to imagine having the DWN make every access
control decision somehow, either in an &amp;quot;advisory&amp;quot; capacity
by serving as an oracle for the site, or in a &amp;quot;mandatory&amp;quot;
capacity by requiring cryptographic controls for every
action. For instance, every photo could be encrypted
and if the site asks to share a photo
with DID &lt;strong&gt;XYZ&lt;/strong&gt;, you (or the DWN) then shares the
encryption key with that DID. People have tried to build
this kind of system (e.g., &lt;a href=&quot;https://tahoe-lafs.org/trac/tahoe-lafs&quot;&gt;Tahoe-LAFS&lt;/a&gt;),
but the results are technically complex and likely not
easy to map onto everyone&#39;s existing access control
systems. To take just one problem: how do you map
the existing identifier space of Flickr (or Twitter) to
DIDs?&lt;/p&gt;
&lt;p&gt;This is just a specific instance of a general situation, which
is that even if the &lt;em&gt;data&lt;/em&gt; is stored on a device you
control, the behavior of an application is dictated by
the application logic, which is largely out of your control.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I certainly understand the motivation for this work. Having all
of your data locked up in various silos sucks—don&#39;t even get me started on &lt;a href=&quot;https://educatedguesswork.org/posts/streaming-apps/&quot;&gt;streaming apps&lt;/a&gt;—and it would be great
to have interoperability. With that said, I don&#39;t think this
is a very promising technical direction.
Long experience with
standardizing protocols for applications as diverse as
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Imap&amp;amp;oldid=844491887&quot;&gt;e-mail&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=CalDAV&amp;amp;oldid=1089512635&quot;&gt;calendaring&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Lightweight_Directory_Access_Protocol&amp;amp;oldid=1090782223&quot;&gt;directories&lt;/a&gt;,
and &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Session_Initiation_Protocol&amp;amp;oldid=1086645805&quot;&gt;telephony&lt;/a&gt; teaches us that if you want to have interoperability
you need to produce detailed specifications that encode the semantics
of the application domain, and that this, not the mechanics of data
storage and retrieval, is the hard part.
The Web5 specifications—at least at present—almost exclusively focus on
those generic mechanics, leaving the real problems unsolved.&lt;/p&gt;
&lt;p&gt;In my opinion a better way to attack this problem would be to attempt
to solve some specific set of application domains (start with Twitter-like
microblogging, perhaps?)
and see if you can build a protocol or protocol suite that would enable
interoperability there. This would also require getting actual buyin
from the various sites that you expect to be consumers of this protocol,
which seems like it will be very challenging under the best of circumstances.
Once you&#39;ve done a few application domains, you can
try to figure out what the common ideas are and perhaps try to build
them into some generic infrastructure that makes future protocols easier.
This is obviously a lot more effort, but I think it&#39;s far more likely
to succeed than trying to build a generic system and hoping
people will somehow make it work.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though Solid apparently has a DID method as well. &lt;a href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
sometimes you can not know how to build some
feature but at the end of the day you could ship
without it. This is what happened with
&lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-tls-esni-14.html&quot;&gt;TLS Encrypted Client Hello&lt;/a&gt;,
but then we actually figured out how to do it later. &lt;a href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See &lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/&quot;&gt;here&lt;/a&gt;
for some problems with more decentralized options. &lt;a href=&quot;https://educatedguesswork.org/posts/web5-first-impressions/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>On Blockchains/Ledgers and Identity Systems</title>
		<link href="https://educatedguesswork.org/posts/blockchain-identity/"/>
		<updated>2022-06-06T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/blockchain-identity/</id>
		<content type="html">&lt;style&gt;
.img-wrap {
  display: inline-block;
}
.img-wrap img {
  width: 40%;
}&lt;/style&gt;
&lt;p&gt;OK, so I managed to get through my
&lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity&quot;&gt;post&lt;/a&gt; on identity while only using the
word &amp;quot;blockchain&amp;quot; twice. However, the story of self-sovereign
identity/decentralized identity is inextricably intertwined with
blockchains: much of the interest in decentralized identity comes out
of the blockchain/Web3 quarter and a very large fraction of the
&lt;a href=&quot;https://www.w3.org/TR/did-spec-registries/&quot;&gt;proposals&lt;/a&gt; in this space involve
blockchain in one way or another. This post tries to explain
the role of the ledger in these systems, which, as we&#39;ll
see, is surprisingly limited.&lt;/p&gt;
&lt;h2 id=&quot;background%3A-did&quot;&gt;Background: DID &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#background%3A-did&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The main specification in this space
is something called (unsurprisingly)
&lt;a href=&quot;https://www.w3.org/TR/did-core/&quot;&gt;Decentralized Identifiers (DID)&lt;/a&gt;.
You don&#39;t actually need to know about DIDs to talk about decentralized
identity, but most of the mechanisms are now defined in terms
of DID, so it&#39;s most convenient to use DID terminology.
The DID specification isn&#39;t actually a type of identifier but rather a
generic framework for identifiers. A DID is a kind of URI that
has the scheme &lt;code&gt;did:&lt;/code&gt;, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://www.w3.org/TR/did-core/diagrams/parts-of-a-did.svg&quot; alt=&quot;DID URI&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: DID Core specification]&lt;/p&gt;
&lt;p&gt;Each DID has a &lt;em&gt;method&lt;/em&gt; and then a &lt;em&gt;method-specific identifier&lt;/em&gt;.
The method describes how you use the method-specific identifier
to look up what&#39;s called a &lt;em&gt;DID document&lt;/em&gt; which contains the
actual identity information you are interested in in a &lt;a href=&quot;https://json-ld.org/&quot;&gt;JSON-LD&lt;/a&gt;
structure, for instance:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;@context&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string&quot;&gt;&quot;https://www.w3.org/ns/did/v1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string&quot;&gt;&quot;https://w3id.org/security/suites/ed25519-2020/v1&quot;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;did:example:123456789abcdefghi&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;authentication&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;did:example:123456789abcdefghi#keys-1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Ed25519VerificationKey2020&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;controller&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;did:example:123456789abcdefghi&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;publicKeyMultibase&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;zH3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV&quot;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;[Source: DID Core specification]&lt;/p&gt;
&lt;p&gt;What&#39;s sort of unusual about DID is that the specification defines the
format of the DID document but not the methods. So, for instance,
in the example above, if you know the method &lt;code&gt;example&lt;/code&gt; then
you can obtain (technical term: &lt;em&gt;resolve&lt;/em&gt;) the DID document,
but if you don&#39;t know that method, then you can&#39;t do anything with
the DID.&lt;/p&gt;
&lt;h3 id=&quot;did%3Akey&quot;&gt;&lt;code&gt;did:key&lt;/code&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#did%3Akey&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Pretty much the simplest kind of identifier here is just a bare public
key, which is approximately what &lt;a href=&quot;https://w3c-ccg.github.io/did-method-key/&quot;&gt;&lt;code&gt;did:key&lt;/code&gt;&lt;/a&gt;
provides. &lt;code&gt;did:key&lt;/code&gt; DIDs look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;did:key:z6LSeu9HkTHSfLLeUs2nnzUSNedgDUevfNQgQjQC23ZCit6F
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;keys-vs.-hashes&quot;&gt;Keys vs. Hashes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#keys-vs.-hashes&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Note that for authentication purposes, it&#39;s not necessary
to have the key; you could just carry a digest of the key
and then have the signer supply the key along with their
signature. However, DIDs also allow you carry keys for
encryption, where this trick doesn&#39;t work. Of course,
for modern elliptic curve algorithms, the public key
is essentially the same size as the hash would be, so
this is less useful. Of course, for post-quantum
algorithms, the keys &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/&quot;&gt;may be bigger&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is basically just a type specifier that indicates
what algorithm the key is associated with (&lt;code&gt;z6Mk&lt;/code&gt; means X25519) followed by
the public key. Because the concept of DIDs is that you &lt;em&gt;resolve&lt;/em&gt; the
DID into a DID document, there is a (somewhat hazily defined)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
way to &lt;em&gt;expand&lt;/em&gt; the key into a DID document. However, for our purposes we can
just think of this as a public key.&lt;/p&gt;
&lt;p&gt;As described &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#key-recovery&quot;&gt;previously&lt;/a&gt;,
this kind of explicit key system isn&#39;t very flexible. Because it binds
your identity to your public key, it doesn&#39;t easily allow to to
(for instance) update your keys.&lt;/p&gt;
&lt;h3 id=&quot;did%3Aweb&quot;&gt;&lt;code&gt;did:web&lt;/code&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#did%3Aweb&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At the other end of the spectrum we have &lt;a href=&quot;https://w3c-ccg.github.io/did-method-web/&quot;&gt;&lt;code&gt;did:web&lt;/code&gt;&lt;/a&gt;
in which the DID is effectively a URI that points to the DID document,
as in:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;did:web:example.com:user:alice&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;For some reason the slashes in the path are converted to colons, so this
refers to:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;https://example.com/user/alice/did.json&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;In order to resolve the DID, the RP connects to the server and retrieves the
indicated document. This document can of course be arbitrarily rich
and contain not only keying material but also other assertions about
the user, including third party assertions such as &amp;quot;the State of California
asserts that this user&#39;s personal name is Alan Smithee.&amp;quot; This is all
left kind of vague in the specs, but I think the idea is that if you
were to obtain such an assertion, you would add it to your DID document
and upload it to the server. Incidentally, this has horrifying privacy
properties, but maybe you could find some way to encrypt them or
have some other kind of access control.&lt;/p&gt;
&lt;p&gt;It&#39;s important to realize at this point that the
server is actually doing two things: &lt;em&gt;authenticating&lt;/em&gt; the identity
document and &lt;em&gt;publishing&lt;/em&gt; it. That&#39;s sort of natural in the Web
context, but there&#39;s nothing inherent about it, and it&#39;s quite
possible to have systems in which authentication and publication
are totally separate; you just need some way to distribute the
data. Because Web servers exist to serve data, it feels natural
to combine them into one service, but, as we&#39;ll see below,
it&#39;s not as good an idea when the identity is rooted in
a different system.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;did:web&lt;/code&gt; and similar Web-based have effectively the opposite properties from
&lt;code&gt;did:key&lt;/code&gt; and other key-based identifiers. If you want to replace your key all you
need to do is update the DID document. Regrettably, the did:web specification
&lt;a href=&quot;https://example.com/did-method-web/#update&quot;&gt;punts this issue&lt;/a&gt;
but presumably one could invent something, whether it&#39;s a Web page
that one could update manually, a Web API, or a standardized protocol
like &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc4918&quot;&gt;WebDAV&lt;/a&gt;. On the other
hand, like an e-mail address, the identifier is completely controlled by the operator of
the Web site it&#39;s being served off of. For instance, if you have the
identifier &lt;code&gt;did:web:example.com:users:fuzzy-dunlop&lt;/code&gt;, and the operator
of &lt;code&gt;example.com&lt;/code&gt; decides to change your public key, they can just do
so, whenever they want.&lt;/p&gt;
&lt;p&gt;This isn&#39;t really an issue for identifiers that are supposed to represent
the domain operator, as in a &lt;a href=&quot;http://localhost:8080/posts/vaccine-passport-nz/&quot;&gt;vaccine passport&lt;/a&gt;
system, but if you are a user of a system operated by someone else
it means you don&#39;t control your own identity.
Of course, in principle you could register your own domain and host
your identity there, but few people do; as a practical matter if
&lt;code&gt;did:web&lt;/code&gt; was ever to be popular with ordinary users we&#39;d expect
most of the identifiers to be &lt;code&gt;did:gmail.com:&amp;lt;username&amp;gt;&lt;/code&gt; or the like,
with Google (or Yahoo or whoever) controlling the identities for
those users.&lt;/p&gt;
&lt;p&gt;Even for users who do register their own domain, at the end of the day
&lt;code&gt;did:web&lt;/code&gt; identities are just bootstrapping off the existing
DNS namespace and the WebPKI, which asserts identities within it.
Because the namespace is hierarchical, this means that your
identity can be taken away by someone who can control the relevant part
of the DNS, for instance if a government
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#government-takeover&quot;&gt;seizes your domain&lt;/a&gt;.
If what you
want is &lt;a href=&quot;http://localhost:8080/posts/understanding-identity/#other-cryptographic-identity-systems&quot;&gt;&amp;quot;self-sovereign
identities&amp;quot;&lt;/a&gt;
in which &amp;quot;A person’s digital existence is now independent of any
organization: no-one can take their identity away.&amp;quot;  then this isn&#39;t
it.&lt;/p&gt;
&lt;h4 id=&quot;online-vs.-offline-authentication&quot;&gt;Online vs. Offline Authentication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#online-vs.-offline-authentication&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s worth noting that even if we ignore these inherent structural
issues in a system like this, it&#39;s somewhat odd to have an
identity assertion require access to an &lt;em&gt;online&lt;/em&gt; resource,
in this case the Web server. By contrast, WebPKI certificates
can be verified &lt;em&gt;offline&lt;/em&gt;, which means you don&#39;t need to
contact the certificate authority in order to validate them.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
The reason for this is that the certificates are digitally
signed by the CA and that signature can be verified by anyone.
Operationally, this means that if you send someone a message
signed with the key corresponding to your DID, that person
can&#39;t verify it without contacting the Web server; if that
server—or the relying party—is offline, the relying party
will have to wait.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;did:web&lt;/code&gt; assertions aren&#39;t just online: because of the specific way
in which TLS works, they are also &lt;em&gt;deniable&lt;/em&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
What I mean by that is that if I connect to a Web server—any
Web server—and retrieve some data, I have no way of proving
the contents of that data (depending on the
protocol details I may be able to prove
that I made the connection). The reason for this is that
the data is protected with a symmetric key which is jointly
known both to me and the server and so either side can produce
the same protocol messages (technical term: protocol &lt;em&gt;trace&lt;/em&gt;).&lt;/p&gt;
&lt;p&gt;There&#39;s no way to to prove that the Web server made a specific
identity assertion, so, for instance, if you send me a document signed
with a &lt;code&gt;did:web&lt;/code&gt; identity and then change your key pair, you
can just deny that the key that signed the document was yours
and I can&#39;t prove otherwise. This is an attractive property
in some situations, but limits the use of this kind of identity.&lt;/p&gt;
&lt;p&gt;A related property is that it makes it harder to
detect malfeasance by the Web server. Suppose that the Web
server occasionally lies about the public key of a given
identity (e.g., so that the attacker can impersonate Alice).
How would you detect this? In a certificate system like the
WebPKI you can use a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Certificate_Transparency&amp;amp;oldid=1076165555&quot;&gt;transparency log&lt;/a&gt;
which—at least in theory—allows you to detect
this, but the problem is much harder when the data is retrieved
from the Web. In principle you could have transparency
just for the keys, but without a way to prove that the server
actually sent you a given key, anyone can frame the server
for sending a bogus key, which means that malfeasance
is deniable. The transparency log is still of some value,
but it needs to be checked in real time and it&#39;s still
unclear what you do if a key isn&#39;t in the log.&lt;/p&gt;
&lt;p&gt;Even if we discount malice by the server operator, we also have
to deal with server compromise: if the Web server is compromised
the attacker can serve any DID responses it wants. By contrast,
if the DIDs were signed then the signature key could be kept
offline and not be subject to online attack. For obvious reasons
the Web server&#39;s authentication key cannot be kept offline,
as it must be used with more or less every transaction.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;ledger-based-did-methods&quot;&gt;Ledger-Based DID Methods &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#ledger-based-did-methods&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With the above as background, it&#39;s useful to look at the
systems we now see being proposed, and see what&#39;s going on.
As I mentioned above, the DID specification is really a framework
for &lt;a href=&quot;https://www.w3.org/TR/did-spec-registries/&quot;&gt;different methods&lt;/a&gt;, each of which has their own way of
resolving the DID document from the identifier. Quite a few
of these are tied to some blockchain or another and while
the details differ, the general concepts seem to be fairly
similar. The following description is sort of a mashup
of &lt;a href=&quot;https://w3c-ccg.github.io/did-method-v1/&quot;&gt;&lt;code&gt;did:v1&lt;/code&gt;&lt;/a&gt; and
&lt;a href=&quot;https://hyperledger.github.io/indy-did-method/&quot;&gt;&lt;code&gt;did:indy&lt;/code&gt;&lt;/a&gt; that
hopefully captures the general flavor.&lt;/p&gt;
&lt;p&gt;The main idea is that the ledger—however that&#39;s implemented—
some set of functions, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Creating an identity document&lt;/li&gt;
&lt;li&gt;Updating a given identity document&lt;/li&gt;
&lt;li&gt;Reading an identity document&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Effectively, what this does is take &lt;code&gt;did:web&lt;/code&gt;, cross-out &lt;code&gt;web&lt;/code&gt;, and
write &lt;code&gt;&amp;lt;insert-ledger-here&amp;gt;&lt;/code&gt; in its place; it&#39;s kind of a rough fit.&lt;/p&gt;
&lt;p&gt;As with &lt;code&gt;did:key&lt;/code&gt;, each identity document is associated with a given
cryptographic key and so the identifier is derived from the key,
for instance by hashing it. The ledger is supposed to enforce this
requirement.&lt;/p&gt;
&lt;p&gt;Once a document has been created, it is possible to update it using
the update function. Updates are authorized by the current key,
which, again, is enforced by the ledger. Importantly, you can change
the current key but this doesn&#39;t change the identifier, so it&#39;s
possible for the public key to become totally decoupled from the
identifier in such a way there&#39;s no relationship.&lt;/p&gt;
&lt;p&gt;You can resolve an identifier by doing a read operation, which
returns the identity document.&lt;/p&gt;
&lt;p&gt;In any system like this, it&#39;s important to understand what the
ledger is doing for us, because there is a tendency to think
of ledgers/blockchains as magic. At a high level, then, the
ledger is providing three services:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Storing the user&#39;s identity document(s)&lt;/li&gt;
&lt;li&gt;Authenticating the user&#39;s identity document(s) to the RP&lt;/li&gt;
&lt;li&gt;Providing a consensus timeline for changes to the identity
document(s)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Arguably, the first of these is largely unnecessary, the second is bad,
but the third is essential.&lt;/p&gt;
&lt;p&gt;Let&#39;s start with storing the user&#39;s identity document(s).
In the majority of authentication contexts, whether online
authentication like login or messaging applications, the
entity being authenticated is sending some set of data
to the RP and that data is then signed with the appropriate
key. The straightforward thing to do, then, is to provide
the identity document(s) at the same time; this is how
both channel security systems like TLS and messaging systems
like OpenPGP or S/MIME work. Aside from being self-contained,
this also has privacy advantages because it doesn&#39;t require
the RP to query some service for the authenticating identity,
which would leak who was talking to who.&lt;/p&gt;
&lt;p&gt;There are, of course, some applications in which you want
to send an asynchronous encrypted message to someone you haven&#39;t
talked to before, in which case it&#39;s useful to have some way
to look up their key. However, those systems typically
already have some kind of key lookup service that&#39;s a lot
more efficient than a blockchain, and there&#39;s no good reason
to have that data on a permanent public ledger; instead
you&#39;d just publish the relevant encryption keys on the
key lookup system and sign them with the authentication
key, in which case you can bundle the identity documents
along with the signed object. Even if there wasn&#39;t an existing
key lookup service, it would be better to store this data in
some kind of high performance non-ledger system
like &lt;a href=&quot;https://ipfs.io/&quot;&gt;IPFS&lt;/a&gt;, because you don&#39;t need the
ledger to attest to it (recall that the date is self-validating).
Moreover, this has better privacy properties than the ledger,
which is inherently public.&lt;/p&gt;
&lt;p&gt;To understand why I say that authenticating the identity document
in the ledger is bad, you have to think about the problem of updating your
keys. This is special because unlike other kinds of metadata
you can&#39;t just sign it yourself.&lt;/p&gt;
&lt;h2 id=&quot;how-to-update-your-keys&quot;&gt;How to update your keys &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#how-to-update-your-keys&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you never allow anyone to update their keys, then life is
very simple because the key is self-authenticating and you
can sign any updates to the identity documents with that key.
However, there are good reasons to want to update keys. For instance,
you might have started with a 2048-bit RSA key &lt;em&gt;A&lt;/em&gt; and move to an
256-bit Elliptic Curve key &lt;em&gt;B&lt;/em&gt;. The secure way to update a key in
a system like this is to have the
old key sign the new one. This creates a chain of identities,
like so:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;A → B&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When you want to authenticate with identity &lt;em&gt;A&lt;/em&gt; you then present
something like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The original identity which points to key &lt;em&gt;A&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;The signature using &lt;em&gt;A&lt;/em&gt; over key &lt;em&gt;B&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The semantics of this is that the RP accepts key &lt;em&gt;B&lt;/em&gt; as representing
identity &lt;em&gt;A&lt;/em&gt; even though it&#39;s a totally different key. You then
use &lt;em&gt;B&lt;/em&gt; for whatever you would use &lt;em&gt;A&lt;/em&gt; for, for instance to
sign a document or authenticate your login.&lt;/p&gt;
&lt;p&gt;This can obviously be extended to have &lt;em&gt;B&lt;/em&gt; sign &lt;em&gt;C&lt;/em&gt;, in which case
you have:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;A → B → C&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When the RP receives something like this, it needs to verify
the chain of assertions going back to &lt;em&gt;A&lt;/em&gt;.  Note that the original key need not be online: you just
use it to make the delegation to the next key and then you
may never need to use it again—indeed in a true
replacement you may want to destroy &lt;em&gt;A&lt;/em&gt; so it can&#39;t be
stolen. I say &lt;em&gt;may&lt;/em&gt; because you might
have some kind of limited delegation in which you aren&#39;t
actually replacing &lt;em&gt;A&lt;/em&gt; with &lt;em&gt;B&lt;/em&gt; but instead authorizing
&lt;em&gt;B&lt;/em&gt; to be used in some contexts, or for a limited time, as in TLS
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-tls-subcerts-14&quot;&gt;delegated credentials&lt;/a&gt;.
For this reason&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
you probably won&#39;t be signing the bare key but some data structure
(e.g., a JSON document) that describes the semantics of the delegation.
This is effectively the same structure as a WebPKI certificate
chain, except for a single identity.&lt;/p&gt;
&lt;h3 id=&quot;compromise-of-the-original-key&quot;&gt;Compromise of the Original Key &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#compromise-of-the-original-key&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This all seems fine, but what happens if key &lt;em&gt;A&lt;/em&gt; is compromised
(I&#39;ll get to the compromise of key &lt;em&gt;B&lt;/em&gt; in a moment)? The attacker
can then mint a new key &lt;em&gt;X&lt;/em&gt; which he knows and sign it with key &lt;em&gt;A&lt;/em&gt;,
thus creating:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;A → X&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is just as good an assertion as the actual one over &lt;em&gt;B&lt;/em&gt;,
so the attacker has just taken over your identity. Of course,
this doesn&#39;t invalidate your delegation to &lt;em&gt;B&lt;/em&gt;; it&#39;s just
that you and the attacker now jointly control identity &lt;em&gt;A&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;One way of analyzing this situation is that the source of
the problem is that you didn&#39;t actually &lt;em&gt;replace&lt;/em&gt; &lt;em&gt;A&lt;/em&gt; with
&lt;em&gt;B&lt;/em&gt;, because &lt;em&gt;A&lt;/em&gt; is still valid. By this way of thinking,
what we need to do is memorialize that transaction, which
is where ledgers/blockchains come in.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
The idea is that when you
delegate to another key, you record the transaction on
the ledger, which thus provides a (partial) ordering of
operations.
I&#39;ll leave the details of how a blockchain-type
distributed ledger works for another day, but briefly a ledger
is a cryptographic data structure that&#39;s constructed in such a way
that everyone agrees on what events occurred and the
order in which they happened.
So, in this case, we would have a situation
like this:&lt;/p&gt;
&lt;div class=&quot;img-wrap&quot;&gt;
&lt;center&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/did-replacement.png&quot; alt=&quot;Ordering of operations for DID replacement&quot; /&gt;&lt;/p&gt;
&lt;/center&gt;

&lt;p&gt;The RP would then consult the ledger—either directly
or perhaps would be provided with the relevant portions
as part of the authentication transaction—and
verify that each delegation that it was following was
the first one chronologically. Note that the key must
specify which ledger will be used to prevent confusion
about which timeline is authoritative.
Because the delegation
from &lt;em&gt;A → B&lt;/em&gt; happens first, then it is the
right one and &lt;em&gt;A → X&lt;/em&gt; would be rejected.
Note that unlike many blockchain applications, you
actually need to verify that there are no &lt;em&gt;future&lt;/em&gt;
blocks that contain new delegations; otherwise you
might miss that a key had been deprecated.&lt;/p&gt;
&lt;p&gt;It&#39;s important that the actual delegation signatures
be recorded on the ledger and then checked by the
RP (this seems to be a point of &lt;a href=&quot;https://github.com/hyperledger/indy-did-method/issues/23&quot;&gt;some confusion&lt;/a&gt;
in existing designs). You do want the ledger to check
the signatures on the delegation to avoid spam,
but the critical service the ledger is providing is temporal ordering.
It&#39;s not possible for even a compromised ledger to make a fake delegation
unless the currently valid key has been compromised.
Of course, if RPs don&#39;t check the delegation signatures
then they&#39;re just trusting the ledger to behave correctly.
More on this shortly.&lt;/p&gt;
&lt;h3 id=&quot;compromise-of-current-keys&quot;&gt;Compromise of Current Keys &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#compromise-of-current-keys&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;However if the &lt;em&gt;currently valid&lt;/em&gt; key is compromised,
the attacker can redelegate the key to themselves and there&#39;s
nothing you can do in this system, because that delegation will
be chronologically first and so your redelegation
will be perceived as an attack. Some systems, such as
&lt;a href=&quot;https://github.com/decentralized-identity/keri/blob/master/kids/kid0003.md&quot;&gt;KERI&lt;/a&gt;
attempt to address this by pre-committing to key &lt;em&gt;K_{i+1}&lt;/em&gt;
when delegating to key &lt;em&gt;K_i&lt;/em&gt;. In the example above, when &lt;em&gt;A&lt;/em&gt;
was first registered it would come with a commitment to
&lt;em&gt;B&lt;/em&gt; in the form of a hash. Similarly, when &lt;em&gt;B&lt;/em&gt; delegates
to &lt;em&gt;C&lt;/em&gt;, it publishes a hash of &lt;em&gt;D&lt;/em&gt;, as below:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(A, H(B)) → (B, H(C)) → (C, H(D))&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This provides protection against cryptographic attack under
the assumption that the hash is irreversible: the attacker
can break the current key but can&#39;t attack the next one
because it only has the hash. However, it does not provide
security against compromise of whatever device holds
the next key, so it&#39;s only a partial solution, especially
if users—as many users will—store all of their
keys in one place.&lt;/p&gt;
&lt;p&gt;Another approach is to have some sort of recovery key
which can be used to override other transactions. Presumably
that key is then kept in some super secure location.
This key can then be used to recover your identity if the currently
valid key is compromised. Note that this is semantically
the same as a partial delegation to the new key which
can then be revoked by the original key.&lt;/p&gt;
&lt;h2 id=&quot;signature-chain-verification&quot;&gt;Signature Chain Verification &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#signature-chain-verification&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We now have enough background to understand why I said above that
we don&#39;t want the ledger to authenticate the identity documents
to the RP: the validity of those documents is being defined by
there being an unbroken chain of signatures from the original
key, which is itself cryptographically bound to the identifier.
If the RP doesn&#39;t check those signatures, then it&#39;s relying on correct
behavior by the ledger, and you have no way of knowing if the
nodes that added the latest entries actually went to the
trouble of checking the signatures.&lt;/p&gt;
&lt;p&gt;Worse yet, if it doesn&#39;t validate the correctness of the ledger (e.g.,
by authenticating the current state from multiple nodes), then it&#39;s
just trusting whatever ledger node it queried. This is even worse than
the situation with &lt;code&gt;did:web&lt;/code&gt; because at least with &lt;code&gt;did:web&lt;/code&gt; the
server you are querying is nominally responsible for the identity.
With a blockchain-based ledger you&#39;re just asking some random node
you&#39;ve never heard of.&lt;/p&gt;
&lt;p&gt;If the RP is going to validate the signature chain
anyway, then there&#39;s no &lt;em&gt;security&lt;/em&gt; reason for the ledger to
do so, though there may be a performance reason. It&#39;s probably useful if the ledger does some basic
checking—especially before creating documents—in
order to prevent DoS attacks on the ledger, but there may be other
potential mechanisms for doing that, such as charging for
ledger updates (this is what Bitcoin does).&lt;/p&gt;
&lt;h2 id=&quot;temporal-ordering&quot;&gt;Temporal Ordering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#temporal-ordering&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;What we do need the ledger to do, however, is guarantee the temporal
ordering of events, because that&#39;s what prevents redelegation by
the attacker in case of key compromise. Ideally, the RP
would check the ledger for every transaction associated with a given
identity and verify that each delegation was correctly constructed
based on the chronology. However, as a practical matter this requires
having access to a very large portion of the transactions on the
ledger (naively, all of them!) which may run to tens of millions,
so this presents a scaling problem.&lt;/p&gt;
&lt;p&gt;In practice, it&#39;s common for clients to just trust that the ledger
enforced consistency for a given transaction. For payment applications
this means that the ledger accepted the payment transaction. In this
case, it would presumably mean that the ledger had checked the
chain of apparent delegations and gave you the latest valid
document. In this case, it &lt;em&gt;is&lt;/em&gt; necessary that the ledger verify
the signature chain because otherwise an attacker could inject
a bogus delegation, which is then sent to the client. As long as the client checks the signature
itself, this won&#39;t cause the client to get the wrong key, but
it will cause it to be unable to get the right key because it
will get the bogus delegation and then reject it. Effectively,
this is a DoS attack on the valid user.&lt;/p&gt;
&lt;p&gt;Even so, it&#39;s probably better for the ledger node you are
communicating with directly to do the checks on &lt;em&gt;read&lt;/em&gt; rather
than having the checks be done on &lt;em&gt;write&lt;/em&gt;. The reason for
this is extensibility: if ledger nodes check the signature
chain on write/update, then you can&#39;t roll out a new
signature algorithm until you are guaranteed that every
ledger node that is checking accepts it, which precludes
incremental deployment. By contrast, if checks are done on
read, then full clients which have the whole ledger will
be fine as long as they have the new algorithm, and even
&amp;quot;light&amp;quot; nodes which don&#39;t have the whole ledger will
be OK if they pick a ledger node which supports the new
algorithm.&lt;/p&gt;
&lt;p&gt;As described above, as long as the client verifies the signature
chain, if the ledger cheats, then it can cause you to accept the wrong
version of history but it can&#39;t cause you to accept the wrong key
unless the keys are compromised.&lt;/p&gt;
&lt;h2 id=&quot;lost-keys&quot;&gt;Lost Keys &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#lost-keys&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Of course, none of this addresses the case where the user &lt;em&gt;loses&lt;/em&gt;
their keying material. The conventional response to this
problem in the decentralized identity world is
that users should make arrangements in advance, for instance
by keeping your recovery key in a really safe place or maybe
by sharing your recovery key with your friends via something like
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Shamir%27s_Secret_Sharing&amp;amp;oldid=1090718625&quot;&gt;Shamir Secret Sharing&lt;/a&gt;.
However, as I mentioned &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity&quot;&gt;previously&lt;/a&gt;,
we know that in practice many users do not do a good job of managing
their keys, even when a lot is at stake (e.g., millions
of dollars in Bitcoin), so while surely some users will
in fact follow this kind of practice, many will
just store all their keys in one place and may lose them.&lt;/p&gt;
&lt;p&gt;As with &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/&quot;&gt;blockchain-based name systems&lt;/a&gt;, if you want to have a system which lets you recover
your identity even if you&#39;ve lost all your keying material—for
instance you dropped your phone in the toilet and you don&#39;t
have a backup—you need some mechanism for recovery
that ultimately depends on human discretion not technology.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;To go back to the question I asked at the beginning, what is the ledger
doing here?&lt;/p&gt;
&lt;p&gt;The primary value proposition of these designs is, as
in the passage I quoted above, that you&#39;re not dependent on others for
your identity:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This is called “self-sovereign” identity because each person is now
in control of their own identity—they are their own sovereign
nation. People can control their own information and
relationships. A person’s digital existence is now independent of
any organization: no-one can take their identity away.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The technical feature that provides this property is not the ledger.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
Rather, it&#39;s that your identity is bound to—indeed, defined by—a cryptographic
key pair. Similarly, if there are assertions bound to the key,
as people seem to expect, then what makes that work is that
those assertions include signatures over your identity (in this
case the key). None of this requires any kind of ledger; you
could just do it with &lt;code&gt;did:key&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The main necessary function of a ledger in this kind of system is that it allows
you to &lt;em&gt;verifiably&lt;/em&gt; transfer control of an identity from one key to
another in a way that is secure even if the initial key is later
compromised. In practice, the ledger also seems to being used
as a publication mechanism for identity information, but that&#39;s actually something
that is better done by other mechanisms to the extent to which
it&#39;s necessary at all. Publishing data in ledgers is super-expensive
and so should be a last resort, not a first one.&lt;/p&gt;
&lt;p&gt;Unfortunately, the ledger only provides  a partial solution to recovering from key compromise
and loss: if you lose all of your keys and/or the attacker gains
control of them, then this is still unrecoverable without
some mechanism external to the system that allows you to
assign a new key to a given identity without any signature
chain from the original key, which, of course, violates the
value proposition stated above.&lt;/p&gt;
&lt;p&gt;But once you have such a mechanism, then why not just use it all the
time? What I mean here is to assign people human-readable identifiers
(e.g., e-mail address or phone number) rather than random high-entropy
ones and then have a mechanism to bind those identifiers to keys, a la
the WebPKI or DNSSEC. If someone wants to change keys, you issue a new
credential and invalidate the old one. This lets you avoid the
bad ergonomics of key-based identities and the scaling (and privacy)
issues of the ledger. My point here is not that
whoever is empowered to issue those credentials isn&#39;t a weak point in
the system; of course it is. But it&#39;s also a necessary one unless
you&#39;re willing to accept having people be occasionally—or maybe
not so occasionally—locked out of the system
entirely.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
By which I mean that there is an example that presumably you&#39;re
supposed to imitate, but no actual specification for how to do
it as far as I can tell. A number of the DID specs are like this. &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, I know about revocation and OCSP, but I think this
story mostly holds up in the face of OCSP stapling,
CRLite, and CRLSets &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The classic term here is &amp;quot;non-repudiation&amp;quot; but that comes
with a lot of philosophical baggage. &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Ignore TLS session resumption for now. &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
And for others, such as cross-protocol attacks. &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I owe this
observation to Manu Sporny. &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, there are systems where the ledger is used to establish
the original identity in a FCFS fashion, like the various
proposed DNS replacements, but that&#39;s not what I&#39;m talking
about here. &lt;a href=&quot;https://educatedguesswork.org/posts/blockchain-identity/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/div&gt;</content>
	</entry>
	
	<entry>
		<title>Understanding Online Identity</title>
		<link href="https://educatedguesswork.org/posts/understanding-identity/"/>
		<updated>2022-06-02T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/understanding-identity/</id>
		<content type="html">&lt;p&gt;You often hear a lot about &amp;quot;identity&amp;quot; on the Internet, but in my
experience, the situation tends to be pretty muddled. This post
is my attempt to try to unpack a number of different concepts
surrounding identity as well as some of the relevant technologies.&lt;/p&gt;
&lt;p&gt;The most basic function that people think of when they think
of &lt;em&gt;identity&lt;/em&gt; is what might more properly be called &lt;em&gt;authentication&lt;/em&gt;,
which is to say proving that you are who you say you are.
In typical applications, this means proving that
you own/are associated with a specific &lt;em&gt;identifier&lt;/em&gt;, whether
is an account name (e.g., &lt;code&gt;ekr&lt;/code&gt; on Github), an
e-mail address (e.g., &lt;code&gt;ekr@rtfm.com&lt;/code&gt;), or a personal
name (&amp;quot;Eric Rescorla&amp;quot;).&lt;/p&gt;
&lt;p&gt;This kind of identifier mapping is good enough for a wide
variety of applications, but in a number of cases people also
want to be able to prove other facts about themselves, such
as that they are over 21, have a license to drive, or have a given
address.&lt;/p&gt;
&lt;p&gt;As an example of these concepts, consider a drivers license:&lt;/p&gt;
&lt;img src=&quot;https://www.dmvcalifornia.us/wp-content/uploads/2017/09/dl3.jpg&quot; width=&quot;400&quot; /&gt;
&lt;p&gt;This driver&#39;s license contains two identifiers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The driver&#39;s name: &amp;quot;Alexander J. Sample&amp;quot;&lt;/li&gt;
&lt;li&gt;The driver&#39;s license number: I1234562&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Authentication of the license holder is performed by matching the
biometrics on the license (mostly the picture, but also the various
listed characteristics such as sex, hair color, etc.) to the person in
front of you.&lt;/p&gt;
&lt;p&gt;The license also carries a number of other attributes that might
be interesting, such as the date of birth, whether you&#39;re
an organ donor, what driver&#39;s license class you hold, etc.
The way that this all fits together is that you
show the driver&#39;s license to the TSA agents, the cop who pulled you over,
or your bartender. They compare the biometrics to your appearance
and assuming they match, they know—or at least have reason
to believe—that the identifier and the attributes apply to
you.&lt;/p&gt;
&lt;h2 id=&quot;identity-on-the-internet&quot;&gt;Identity on the Internet &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#identity-on-the-internet&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The situation on the Internet is somewhat different: most sites
don&#39;t really need your legal name and biometric authentication
mechanisms don&#39;t translate well into mechanical verification
systems. Instead, most services use a different metaphor: the
&lt;em&gt;account&lt;/em&gt;.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;driver&#39;s-licenses-on-the-internet&quot;&gt;Driver&#39;s Licenses on the Internet &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#driver&#39;s-licenses-on-the-internet&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s actually worth a moment to think about why your driver&#39;s
license isn&#39;t a useful form of identity on the Internet.
The problem isn&#39;t that the information on the license isn&#39;t
relevant, but rather that there&#39;s no really good way to use
them for authentication: pretty much all of the information on the license
is public so anyone who has seen your license knows it and so
it can&#39;t be used for authentication.
In most contexts, there&#39;s no good way to check
the biometrics (it&#39;s not like you had to do a video call to
make a GMail account, though some systems do actually require this).
Finally, although licenses do have anti-forgery
mechanisms, they&#39;re mostly tied to the physical plastic and
so don&#39;t really work in online contexts. This all adds up to
it not being a very useful form of online authentication.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;accounts&quot;&gt;Accounts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#accounts&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The basic idea behind an account is fairly simple. For each service
you interact with, you have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An identifier (i.e., an account ID).&lt;/li&gt;
&lt;li&gt;Some authentication mechanism. Historically, this is a password
(see my &lt;a href=&quot;http://localhost:8080/tags/passwords/&quot;&gt;series on passwords&lt;/a&gt;
for more on the deficiencies of passwords).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When you first interact with a given service, you &lt;em&gt;register&lt;/em&gt;, creating
an account. The service then assigns you an identifier (sometimes
you are allowed to choose one, unless it&#39;s already in use, etc.)
and collects your authentication information (password) and creates
the account. From then on, you can &lt;em&gt;log in&lt;/em&gt; to the account using
your authenticator.&lt;/p&gt;
&lt;h4 id=&quot;example%3A-gmail&quot;&gt;Example: Gmail &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#example%3A-gmail&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;For example, suppose you want to use Gmail. You go to
the site and pick a username and a password, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/gmail-acct-creation.png&quot; alt=&quot;Gmail account creation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The username becomes your email address (with
&lt;code&gt;@gmail.com&lt;/code&gt; appended to the end) and your password
becomes the authenticator.&lt;/p&gt;
&lt;p&gt;But what about those other fields you enter, like your
name? Even though you&#39;re providing your name to Google and it
gets attached to your identity in some sense (e.g., it&#39;s
in the &lt;code&gt;From&lt;/code&gt; line of your email), that Google isn&#39;t
actually doing anything to verify that it&#39;s yours; if you
want to call yourself &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Alan_Smithee&amp;amp;oldid=1085655934&quot;&gt;Alan Smithee&lt;/a&gt;
or &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Herc&amp;amp;oldid=1066587073&quot;&gt;Fuzzy Dunlop&lt;/a&gt;, that&#39;s
your choice and Google will happily attach it to your
account.&lt;/p&gt;
&lt;p&gt;By contrast, Google is &lt;em&gt;authoritative&lt;/em&gt; for your
email address, so they know that&#39;s right: if they say it&#39;s &lt;code&gt;postmaster@gmail.com&lt;/code&gt; then
it is. If Google wants to take away your address and give
it to someone else, then they can just do so.&lt;/p&gt;
&lt;h4 id=&quot;example%3A-amazon&quot;&gt;Example: Amazon &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#example%3A-amazon&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As another example, consider Amazon. You go to their site
and click the right buttons and get the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/amazon-acct-creation.png&quot; alt=&quot;Amazon account creation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Superficially this is just like the Gmail account creation
dialog, with your email address acting as your account
identifier, but there&#39;s actually one very important difference:
Amazon doesn&#39;t just let you pick an account name; they
ask you to provide a preexisting identifier in the form
of either an email address or a mobile number, which they
then use as your account identifier (i.e., username).&lt;/p&gt;
&lt;p&gt;Amazon doesn&#39;t just trust that you have the e-mail address you
claim to have: they check it as part of account creation
process. Moreover, in an important sense the email address
is used as an authenticator because if you lose your password,
Amazon can reset your account with your email address.
That&#39;s not something that works with Gmail (if you
lose your password you can&#39;t read your mail!), which is why
they encourage you to set a recovery account with a separate
address.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;the-cookie&quot;&gt;The Cookie &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#the-cookie&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Of course, on the Web once you&#39;ve logged in with whatever
mechanism (passwords, SMS, etc.) you need to authenticate
subsequent requests. This is done with a &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#shopping-carts&quot;&gt;cookie&lt;/a&gt;.
Cookies can be incredibly long-lived, so in some sense
the cookie is the authenticator.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;What I&#39;m getting at here is that Amazon is bootstrapping
their identities off of another identity system, in this case
either the email address or the &lt;em&gt;public switched telephone network (PSTN)&lt;/em&gt;.
They rely on those systems to maintain people&#39;s identities,
assure they are unique, and ultimately for authentication.
A system like this really has two kinds of authenticators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The password&lt;/li&gt;
&lt;li&gt;The ability to receive a message at the indicated address.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s not uncommon to see systems where you have to demonstrate
both of these in order to log in; this is one form of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Multi-factor_authentication&amp;amp;oldid=1088021370&quot;&gt;multi-factor authentication (MFA)&lt;/a&gt;. I&#39;ve also seen systems which don&#39;t have passwords at all
and just require you to demonstrate the ability to receive at
a given address.&lt;/p&gt;
&lt;h3 id=&quot;federated-authentication&quot;&gt;Federated Authentication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#federated-authentication&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In the example above, the Amazon account is bootstrapped
off of your email or phone number, but once that&#39;s happened,
you authenticate to Amazon directly using your password.
In other words, Amazon has outsourced your identity to
the e-mail/phone system but still controls authentication
for itself. It&#39;s possible to go further, however, and outsource
authentication as well. Consider, for example, the account
creation interface for the popular sports social network
Strava:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/strava-acct-creation.png&quot; alt=&quot;Strava Account Creation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The &amp;quot;Use my email&amp;quot; option is basically the same as with
Amazon, where they use your e-mail address as your identifier
but thereafter use a password, but &amp;quot;Sign up with Google&amp;quot; (or
Facebook or Apple) is different. In this case, you &lt;em&gt;authenticate&lt;/em&gt; with Google
(or Facebook or Apple) as well. The way this works is that if
you already have an account with one of these big services
they can act as an &lt;em&gt;identity provider (IdP)&lt;/em&gt; which authenticates
you to third parties. The technical details are fairly complicated,
(see &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=OAuth&amp;amp;oldid=1088647506&quot;&gt;OAuth&lt;/a&gt;
and/or &lt;a href=&quot;https://openid.net/foundation/&quot;&gt;OpenID&lt;/a&gt; &lt;em&gt;[Edited to add OpenID 2022-06-02]&lt;/em&gt;)
but at a high level, what happens is that the service either
(1) exposes an API called by the third party site
(the technical term here is &lt;em&gt;relying party (RP)&lt;/em&gt;)
or (2) provides the client with a token &lt;em&gt;[Edited to add tokens -- 2022-06-03]&lt;/em&gt; which it gives to
the RP. In either case, this allows the third party site  to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Verify that the browser contacting it is associated with
a particular account on the IdP&lt;/li&gt;
&lt;li&gt;Learn some details about that account.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When you first register with the RP, they will typically bounce you
to the IdP so you can approve information sharing with the RP
and then from then on, they can talk to the RP without explicit
consent. For instance, here&#39;s what Google shares with Strava:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/google-strava-sharing.png&quot; alt=&quot;What Google shares with Strava&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This mechanism, generally referred to as &amp;quot;federated authentication&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;has a number of important advantages
from the perspective of the RP. First, it avoids needing to
create your own credential management system: you don&#39;t need
to check password quality, store passwords (and worry about
the password hashes &lt;a href=&quot;https://haveibeenpwned.com/&quot;&gt;leaking&lt;/a&gt;),
or deal with users losing their passwords and needing to reset
them (this is surprisingly common!). In addition, it streamlines
the user account creation process, by eliminating the need to
create a password—or often an account name—as
well as the need to process the email verification from the RP, which
can be a place that user account creation can stall,
causing you to lose potential users.&lt;/p&gt;
&lt;p&gt;Finally,
the IdP may also offer APIs that give the RP additional
capabilities, such as learning more information about the
user&#39;s account (for instance, your name and your social
contacts) or even to interact with the IdP on the
user&#39;s behalf. For instance, it&#39;s common for developer services
sites like &lt;a href=&quot;https://circleci.com/&quot;&gt;CircleCI&lt;/a&gt; to use GitHub
authentication and then ask for fairly broad permissions such as
to read from and write to your git repositories. This allows
them to integrate tightly with your developer experience, but
of course without having your password.&lt;/p&gt;
&lt;p&gt;As with a direct 1:1 authentication system like a password, sites
will generally persist the user&#39;s information in a cookie. However,
if the user clears their history, moves to a new computer, or
the cookie just expires, instead of asking for the user&#39;s
password instead the site will re-validate the user with the
IdP.&lt;/p&gt;
&lt;h3 id=&quot;enterprise-single-sign-on-(sso)&quot;&gt;Enterprise Single Sign-On (SSO) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#enterprise-single-sign-on-(sso)&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The previous examples were largely for end-users, but suppose that
you operate a company and want to outsource employee services such as payroll or
expenses. These services are now frequently packaged as what&#39;s called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Software_as_a_service&amp;amp;oldid=1089522911&quot;&gt;Software
as a Service
(SaaS)&lt;/a&gt;
which is a fancy name for &amp;quot;we have a Web site that your employees
use&amp;quot;.&lt;/p&gt;
&lt;p&gt;Obviously, your users need to authenticate to these SaaS services,
and in principle you could have them create an account on each of
these services, have the service check their e-mail addresses, and
move forward. However, this has a number of obvious drawbacks,
including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Increased friction for each user, especially if you have
a lot of these services, which is not at all uncommon.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Lack of unified access control policies. For instance, if
you want to require 2FA, you can enforce this centrally
rather than having to reach out to every SaaS provider
you use.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Lack of control. For instance, if a user quits,
how do you notify each SaaS provider to terminate their
account?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These drawbacks can be addressed by using essentially the same technologies
as described in the previous section. In this case, the &lt;em&gt;company&lt;/em&gt;
(or more likely some third party like &lt;a href=&quot;https://auth0.com/&quot;&gt;Auth0&lt;/a&gt; or
&lt;a href=&quot;https://okta.com/&quot;&gt;Okta&lt;/a&gt; &lt;em&gt;[Edited to add Okta -- 2022-06-03]&lt;/em&gt;
acts as the IdP, with each of the SaaS providers acting as the RP.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
When an employee wants to use one of your SaaS providers (e.g., to do
their expenses), they first authenticate to your IdP and then
use the IdP to authenticate to the provider. The IdP login can be
long-lived, allowing the user to authenticate to multiple IdPs
without logging in repeatedly (hence the &amp;quot;single sign-on&amp;quot; name).
This kind of system also allows
the company to track logins, manage access, and disable/suspend
accounts.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;real-world-identities&quot;&gt;Real-World Identities &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#real-world-identities&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You may have noticed that none of
the above does much about your real world identity. As a general
matter, sites just take your assertions about your identity at face
value, allowing you to use whatever name you want, as well as to claim
to be any age you want etc. Some social networks try to require you to
use your &amp;quot;real name&amp;quot; (see, for instance, Facebook&#39;s &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Facebook_real-name_policy_controversy&amp;amp;oldid=1090669160&quot;&gt;real name
policy&lt;/a&gt;),
but not too much hangs on this and they generally don&#39;t try super hard
unless you claim to be someone famous or your name looks fake
(though, as the link above indicates, &amp;quot;looks fake&amp;quot; is a subjective
standard and lots of people have names that someone—or
some algorithm—at Facebook might think were fake.)&lt;/p&gt;
&lt;p&gt;In some cases, sites will make an attempt to actually verify your name,
but the mechanisms are often kind of weak. For instance, in order to
get a Twitter &amp;quot;blue Verified badge&amp;quot; you can
&lt;a href=&quot;https://help.twitter.com/en/managing-your-account/about-twitter-verified-accounts&quot;&gt;send Twitter a photo of your driver&#39;s license&lt;/a&gt;.
This isn&#39;t nothing, but it&#39;s also not at all difficult to photoshop
yourself a fake driver&#39;s license, given that it doesn&#39;t have to pass
much scrutiny and the anti-counterfeiting mechanisms such as holograms
and the like don&#39;t work through the Internet.&lt;/p&gt;
&lt;p&gt;There are a few situations in which a service will attempt to create
a stronger binding between your legal identity and your account,
typically where money is involved. For instance, you might need
to provide your social security number, account number,
mother&#39;s maiden name, your ATM PIN, or demonstrate that you know the amounts of
some recent transactions. Often, these mechanisms work by leveraging
some preexisting relationship (account) you have with the service and then
linking your online account to that preexisting account, so it&#39;s
not like they are trying to authenticate someone they have never
heard of.&lt;/p&gt;
&lt;h2 id=&quot;what&#39;s-wrong-with-this-picture%3F&quot;&gt;What&#39;s wrong with this picture? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#what&#39;s-wrong-with-this-picture%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted above, the ergonomics of having to make an account on every
new system are fairly bad: it requires the user to have a large number
of passwords, which is more opportunities to use a bad password or to
lose your password and have to recover. There are some opportunities
for improvement around the margin (e.g.,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=WebAuthn&amp;amp;oldid=1078276432&quot;&gt;WebAuthn&lt;/a&gt;
instead of passwords for authentication), better form fill-in so users
don&#39;t have to type their name over and over, etc, but at the end of
the day, there&#39;s only so much you can do.&lt;/p&gt;
&lt;p&gt;On the other hand, the existing federated authentication mechanisms
have a number of pretty serious drawbacks.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;centralized-control&quot;&gt;Centralized Control &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#centralized-control&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first big problem with the existing federated identity systems is
that they inherently tie you to a small number of centralized identity
system. First, for RP &lt;strong&gt;A&lt;/strong&gt; to accept an identity from IdP &lt;strong&gt;B&lt;/strong&gt;, &lt;strong&gt;A&lt;/strong&gt;
needs to actually make some kind of arrangement with &lt;strong&gt;B&lt;/strong&gt;. This is
typically pretty lightweight, but probably involves establishing some
kind of pairwise API key. Second, because &lt;strong&gt;A&lt;/strong&gt; has no way of knowing
which IdPs a user has accounts with, it has to offer the user a
separate button for each one, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://penguindreams.org/images/multi-login.png&quot; alt=&quot;NASCAR Problem&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;fixing-the-nascar-problem&quot;&gt;Fixing the NASCAR Problem &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fixing-the-nascar-problem&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The reason that the NASCAR problem is hard to fix is that these
federated identity systems use existing Web technologies
and there&#39;s no way with those technologies to know which
IdPs the client has an account with, so it just has to show
all the logos. If there were such a way then we would have
a privacy problem, because then you could use the set of
IdPs the client had an account with to track them, or, worse
yet, use the same mechanism to encode the user&#39;s identity
by creating a pattern of account/no-account states with various
sites you controlled.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is sometimes called the &lt;a href=&quot;https://indieweb.org/NASCAR_problem&quot;&gt;NASCAR problem&lt;/a&gt;
because it resembles the various advertiser logos you see on NASCAR cars.
This of course contributes to a lousy user experience but also discourages
the site from adding additional IdPs, because each one adds to user confusion.&lt;/p&gt;
&lt;p&gt;When put together, existing federated authentication systems
provide a strong incentive to only accept identities from
the biggest IdPs, which promotes centralization and makes it
hard for new providers to enter the market.&lt;/p&gt;
&lt;h3 id=&quot;privacy&quot;&gt;Privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#privacy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In general, the privacy properties of existing federated authentication
systems are quite bad. Every time you log into site &lt;strong&gt;A&lt;/strong&gt; with
IdP &lt;strong&gt;B&lt;/strong&gt;, &lt;strong&gt;B&lt;/strong&gt; learns about it. This allows your IdP to track
you around the Internet whenever you use it to log in. This is made worse by the high
level of centralization in two ways. First, because it is hard
to start a new IdP it is hard for users to find one that has better
privacy, whether in terms of better policies or better technology.
Second, because there are a small number of IdPs, this creates concentration
of this tracking information. In addition, many of the existing
IdPs already do a lot of Web tracking via other mechanisms.&lt;/p&gt;
&lt;p&gt;Another privacy problem is that IdPs typically provide the same identifier
(e.g., your e-mail address) to each RP. Sites can use these identifiers
to track users (see this &lt;a href=&quot;https://freedom-to-tinker.com/2017/09/28/i-never-signed-up-for-this-privacy-implications-of-email-tracking/&quot;&gt;post&lt;/a&gt; by Steve Englehardt on this topic). This is actually technically
soluble by having the IdP give a new identifier to each site,
but this is not general practice, in part because sites &lt;em&gt;want&lt;/em&gt; the
user&#39;s true identifier so that they can contact you. This problem also exists with conventional
e-mail/password systems but can be addressed with e-mail masking systems
like &lt;a href=&quot;https://relay.firefox.com/&quot;&gt;Firefox Relay&lt;/a&gt;
or Apple&#39;s &lt;a href=&quot;https://developer.apple.com/documentation/sign_in_with_apple/sign_in_with_apple_js/communicating_using_the_private_email_relay_service&quot;&gt;Private Email Relay&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;improving-federated-identity&quot;&gt;Improving Federated Identity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#improving-federated-identity&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There has been a fair amount of work over the years on building
federated identity systems with better properties.&lt;/p&gt;
&lt;h3 id=&quot;end-user-certificates&quot;&gt;End-User Certificates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#end-user-certificates&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In the early days of the Web—well before things like Google Login existed—a lot of people thought that users would
authenticate with certificates: every user would be issued a
certificate with their identity, much like Web sites have certificates
that attest to theirs. Presumably these certificates would have the
user&#39;s e-mail address and maybe their name.  They would then be able
to use TLS certificate-based client authentication to authenticate to
every server. This has much the same identity properties as federated
identity, but has better privacy properties because the CA doesn&#39;t
need to be involved in the authentication transaction and so doesn&#39;t
learn what sites you are going to.&lt;/p&gt;
&lt;p&gt;Client certificates also potentially have better
centralization properties.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
In particular, client certificates have the potential to fix the
NASCAR problem because the client knows which certificates you
have, so the site doesn&#39;t need to display the logos of every CA
you might have a certificate with.&lt;/p&gt;
&lt;p&gt;Needless to say, this never happened; TLS client authentication is
in use in some settings, typically for enterprises which issue their
own certificates but never really became a plausible competitor
to passwords and then federated authentication came along. There
are quite a number of reasons for the failure of client certificates,
but any list would probably include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The lack of certificate authorities which would issue convenient
free client certificates (this was true for server certificates
too until &lt;a href=&quot;https://letsencrypt.org/&quot;&gt;Let&#39;s Encrypt&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The TLS interaction is pretty bad in a number of ways,
such as playing badly with TLS intermediaries such as CDNs
and, prior to TLS 1.3, leaking the client&#39;s certificate if
you did authentication at the beginning of the connection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A truly hideous UI. I&#39;ve shown the Edge UI below but all
of the browser client auth UIs are pretty bad.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://textplain.files.wordpress.com/2020/05/image-7.png?w=1024&quot; alt=&quot;Client certificate UI&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: Eric Lawrence]&lt;/p&gt;
&lt;p&gt;In addition, because you use the same certificate for every site,
it can be used to track you across sites, which is obviously a
privacy problem, though, as noted above, is not a property
unique to client certificates.&lt;/p&gt;
&lt;h3 id=&quot;persona-and-fedcm&quot;&gt;Persona and FedCM &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#persona-and-fedcm&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Although client certificates never really took off, they have
a number of good properties and are a natural starting point
for trying to improve the situation.&lt;/p&gt;
&lt;p&gt;Mozilla took a fairly serious run at this some years back
with &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Mozilla_Persona&amp;amp;id=1052180321&amp;amp;wpFormIdentifier=&quot;&gt;Persona&lt;/a&gt;.
Effectively Persona
worked by making every site its own certificate authority; they
could then issue certificates to browsers which used them for authentication,
so for instance &lt;code&gt;example.com&lt;/code&gt; could issue certificates for addresses
ending in &lt;code&gt;@example.com&lt;/code&gt;. The browser would then use those
certificates to sign into sites. This was intended to have the
benefits of certificate-based authentication but be easier to
deploy and more compatible with Web technologies.
One very important property was that
because the site could use
the certificate to authenticate to any server, it didn&#39;t
allow the IdP to track the user.&lt;/p&gt;
&lt;p&gt;The obvious way to implement Persona was with browser support:
when the user creates an account with an IdP, the browser
would keep track of it. When the user wants to log into
a site, it calls a browser API, which causes the browser
to present a list of acceptable IdPs which the user can
choose from, thus avoiding the NASCAR problem and giving
the user more direct control over how their information is
being used. In practice, the initial Persona deployments
depended in a trusted web site to help mediate this
interaction, thus avoiding the need to modify browsers.&lt;/p&gt;
&lt;p&gt;Persona ultimately failed to gain much market traction and
Mozilla stopped working on it, but it inspired other
designs, such as
Chrome&#39;s &lt;a href=&quot;https://fedidcg.github.io/FedCM/&quot;&gt;Federated Credential Management API (FedCM)&lt;/a&gt;.
FedCM is a more modest increment on the current federated
authentication model intended largely to make federated identity
continue to work in environments where third party cookies have been
removed, but also to have some additional privacy benefits.
Unlike Persona, it doesn&#39;t really address centralization,
though it&#39;s possible that it could be extended to do so.&lt;/p&gt;
&lt;p&gt;FedCM is relatively new and so hasn&#39;t seen any real deployment. It&#39;s an open question whether
it will get any deployment or whether any of the big IdPs such as
Google or Facebook will support it (see &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#deployment&quot;&gt;deployment&lt;/a&gt; below).&lt;/p&gt;
&lt;h2 id=&quot;other-cryptographic-identity-systems&quot;&gt;Other Cryptographic Identity Systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#other-cryptographic-identity-systems&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Recently there has been increasing interest in the use of cryptographic
identity systems that are often called &amp;quot;decentralized&amp;quot; or &amp;quot;self-sovereign&amp;quot;
what&#39;s called &amp;quot;self-sovereign&amp;quot; or &amp;quot;decentralized&amp;quot; identity. Here&#39;s
how Sovrin &lt;a href=&quot;https://sovrin.org/faq/what-is-self-sovereign-identity/&quot;&gt;describes this&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Everyone (including businesses and IoT) has different relationships
or unique sets of identifying information. This information could be
things like birth date, citizenship, university degrees, or business
licenses. In the physical world, these are represented as cards and
certificates that are held by the identity holder in their wallet or
safe place like a safety deposit box, and are presented when the
person needs to prove their identity or something about their
identity.&lt;/p&gt;
&lt;p&gt;Self-sovereign identity (SSI) brings the same freedoms and personal
autonomy to the internet in a safe and trustworthy system of
identity management. SSI means the individual (or organization)
manages the elements that make up their identity and controls access
to those credentials– digitally. With SSI, the power to control
personal data resides with the individual, and not an administrative
third party granting or tracking access to these credentials.&lt;/p&gt;
&lt;p&gt;The SSI identity system gives you the ability to use your digital
wallet and authenticate your own identity using the credentials you
have been issued. You no longer have to give up control of personal
information to dozens of databases each time you want to access new
goods and services, with the risk of your identity being stolen by
hackers.&lt;/p&gt;
&lt;p&gt;This is called “self-sovereign” identity because each person is now
in control of their own identity—they are their own sovereign
nation. People can control their own information and
relationships. A person’s digital existence is now independent of
any organization: no-one can take their identity away.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Controlling your own identity sounds good, but it&#39;s remarkably difficult to get a clear
picture of precisely what people have in mind here. For example, in an early
&lt;a href=&quot;http://www.lifewithalacrity.com/2016/04/the-path-to-self-soverereign-identity.html&quot;&gt;post&lt;/a&gt;
on the topic, Christopher Allen writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;With all that said, what
is self-sovereign identity exactly? The truth is that there’s no
consensus. As much as anything, this article is intended to begin
a dialogue on that topic. However, I wish to offer a starting
position.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Rather than try to offer a definition, the rest of this section instead
focuses on what&#39;s technically possible in this space.&lt;/p&gt;
&lt;p&gt;In general, the starting point for these systems is to root identity in
a cryptographic key. I.e., I create a public/private key pair and my
public key then becomes my identity. This has the convenient property
that it&#39;s &lt;em&gt;self-authenticating&lt;/em&gt;: I don&#39;t need to use a password or any
other authenticator because I can prove my identity just by signing
a challenge with my private key. In principle I could just create
an account by giving you my public key and having that be the account
ID.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;attributes&quot;&gt;Attributes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#attributes&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Unfortunately, as-is this system also has a number of significant drawbacks.
First, as we&#39;ve seen throughout this post, sites don&#39;t want to address
users through opaque identifiers, they want to attach them to some
means of contacting them, like an e-mail address or a phone number.
This is partly because sites want to actually be able to contact their
users and—at least at present—it&#39;s not really practical to
message users via their public key pair and partly because it lets
them deal with exceptional cases like &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#key-recovery&quot;&gt;account recovery&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Most of the decentralized identity systems I have seen proposed have
some mechanism to attach more meaningful attributes to a given
identity. The simplest version is effectively a certificate,
i.e., a signed statement that a given public key belongs to
someone with the following properties (e-mail, name, date of birth, etc.).
A number of these systems use fancy cryptography to allow for
selectively disclosing pieces of these attributes (e.g.,
&amp;quot;I am over 21&amp;quot; but not my birthday).
However, it&#39;s a bit unclear who would do this signing; for instance,
who would you trust to attest to my personal name? The government? Which
government? How about my email address?&lt;/p&gt;
&lt;h3 id=&quot;key-recovery&quot;&gt;Key Recovery &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#key-recovery&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The second problem with this kind of system is that if
you lose your private key you lose access to your account—or
more likely, all your accounts.
There are a lot of proposed mechanisms
for addressing private key loss (e.g., secret share your key with 10
of your closest friends) but you can be sure that plenty of people
won&#39;t do them. Long painful experience shows that users lose their
credentials quite frequently, don&#39;t do much to plan ahead for that
event, and any system that doesn&#39;t recover gracefully if the user
drops their phone in the toilet is going to have a lot of dissatisfied
customers.&lt;/p&gt;
&lt;p&gt;Of course, you can always create a new key and then
get the same attributes attached to it—and potentially
detached from the old key. Depending on the precise structure
of the system, this may or may not be technically possible
(for instance, you could have a system where each e-mail
address was registered on the blockchain and nobody could
ever re-register it). However, as we saw with
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/&quot;&gt;blockchain-based DNS systems&lt;/a&gt;,
the problem becomes that the same mechanisms which are
designed to give you complete control of your identities
independent of third parties also make it difficult for those
third parties to help you recover your identity if you lose
your keys. Obviously, this makes a lot more sense for attributes
which aren&#39;t unique, such as your age, but at the end of the
day you&#39;re still at the mercy of the people attesting to your
attributes, and those, not your key, become your true identity.&lt;/p&gt;
&lt;h3 id=&quot;independence&quot;&gt;Independence &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#independence&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At the end of the day, I&#39;m not sure how much these systems really deliver
on the independence value proposition of self-sovereignty that I quoted above. The
problem here is that there are two kinds of identities in play:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A trivial form of identity which is basically &amp;quot;I am the person
with this public key&amp;quot;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A deeper form of identity which ties that key pair to other
attributes which people actually care about, such as your
name or e-mail address.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first type of identity is indeed independent in the sense that
it&#39;s hard to take away from you and you don&#39;t need anyone&#39;s help to
exercise it. The second, however, depends on a whole infrastructure
of third parties who are busily attesting to various properties
that are then somehow attached to your key pair. And for the system
to function properly, you need them to do that attestation not just
once but regularly. This statement may come as a surprise, but
in real identity systems you generally need some way to revoke assertions
when you discover (for instance) that people&#39;s keys have been compromised
or that the assertion was issued incorrectly. You need to be able to
do this without the cooperation fo the subject, and so that means
that in practice the attesting entity needs to be involved pretty
regularly and so you&#39;re not really able to exercise those
forms of identity independently from them.&lt;/p&gt;
&lt;p&gt;This is not to say that you can&#39;t use cryptography to build identity
systems that will have better properties than our current third-party
identity systems, especially in the area of privacy and tracking
by the IdPs. However, it seems to me that it&#39;s mostly the decoupling of the identity assertion
from the IdP—as in Persona—that provides that value,
not having them be decentralized or rooted in an identity tied to
a specific cryptographic key.&lt;/p&gt;
&lt;h2 id=&quot;deployment&quot;&gt;Deployment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#deployment&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A major challenge with any new identity system is getting broad-scale
deployment. Specifically, it&#39;s not worth it for RPs to support a
new IdP unless that IdP has a lot of existing users. Conversely,
it&#39;s not worth users creating accounts with an IdP unless a lot
of RPs accept that IdP. This deadlock makes it hard to get going with
something new, and it should come as no surprise that all the major public IdP systems
are associated with services like Google, Facebook, or Twitter which
already have large user bases of people who use the service for some
other reason. This allows them to easily offer a valuable
authentication service and makes it worthwhile for RPs to accept them.
Any new identity system will somehow have to get past this.&lt;/p&gt;
&lt;p&gt;Right now, this dynamic makes it difficult for a new IdP to
enter the market even if its APIs are basically identical
to an existing IdP, both because the existing systems tend
to need prior arrangement and because the NASCAR problem
makes it expensive for RPs to support a new IdP.
However, this need not be the case: it&#39;s possible to design an identity
protocol which works with any IdP without prearrangement—indeed
Persona was such a protocol—but in order for that to get off
the ground you&#39;d still need some large IdP to support it in order
to bootstrap RP support. For obvious reasons, that kind of
interoperability is not really in the interest of existing
IdPs, and most of the proposals I have seen for improving
the situation don&#39;t come from IdPs.&lt;/p&gt;
&lt;p&gt;The same basic situation applies to cryptographic identity
systems. It takes extra work on the part of the RP to support
such a system and that work is hard to justify if there&#39;s no
additional benefit, either in terms of getting a lot of users
that you couldn&#39;t get before, or in terms of some new capability
that you can get for a lot of existing users (like learning
information you couldn&#39;t learn before).&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that this dynamic applies &lt;em&gt;even if the
new systems are better for users&lt;/em&gt;, because the users can only
really choose between the systems supported by the RPs. For
instance, if you as a user use some new identity system &lt;em&gt;X&lt;/em&gt; that has much
better privacy, but the site you want to go to only supports
Google Login, you can either use Google Login or not, but you
can&#39;t force it to use &lt;em&gt;X&lt;/em&gt;. Once an IdP is well established and
widely supported then users choosing it has some impact at
the margin, but it&#39;s hard to make a system take off through
user choice along.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/understanding-identity/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Identity on the Internet is a difficult problem. Having to
make an individual account for each site is clearly bad.
On the other hand, between a high level of centralization and a low level of privacy
provided by third-party authentication systems is also not great.
However, the
network effect dynamics of identity systems make it very hard
to deploy something new without the cooperation of some system
that has a lot of users, which is to say the services who
are benefiting from the existing system. For that reason,
my first question whenever someone proposes deploying a new
identity system, my first question is &amp;quot;who is going to provide
the identities and how many users do they have already?&amp;quot;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The terminology here is a bit confusing. For instance
some people draw a &lt;a href=&quot;https://sites.psu.edu/ntsh/2010/02/15/delegated-vs-federated-id/&quot;&gt;distinction&lt;/a&gt;
between &amp;quot;delegated&amp;quot; identity systems in which the RP is
outsourcing identity to a given IdP and ones in which the
RP can use any IdP. in practice, it seems to me that most
of the deployed RPs allow a small number of IdPs
but not &lt;em&gt;any&lt;/em&gt; IdP. To some extent there is a policy decision
about which IdPs to support, but as described in this
post, it&#39;s also the case that some technological approaches
are more suited to allowing an arbitrary number of IdPs
than others. My sense is &amp;quot;federated&amp;quot; is the more common
term, so I&#39;m using that here. &lt;em&gt;[2022-06-03]&lt;/em&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In the third party case, the third party would somehow hook into
your identity system so it could authenticate users. &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Because of cookies, this doesn&#39;t necessarily happen instantaneously,
but you can configure things so that the RP requires the user
to re-authenticate frequently, thus giving the IdP a chance
to say that the user&#39;s account is suspended. &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I&#39;m largely excluding
enterprise SSO systems, as they serve a different purpose,
and while in my experience they&#39;re a bit
clunky, it&#39;s more just generic software kludginess than it
is architectural/ecosystem issues. &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Given that roughly half the Web certificates in the world are
issued by Let&#39;s Encrypt, we shouldn&#39;t get too optimistic
about decentralization in the certificate market. &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
We actually do see the use of public keys for authentication
in practice, but usually in the form of attaching
a public key to an existing account, rather than using it
as the account identifier. &lt;a href=&quot;https://educatedguesswork.org/posts/understanding-identity/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Notes on Multiple Encryption and Content Filtering</title>
		<link href="https://educatedguesswork.org/posts/multiple_encryption/"/>
		<updated>2022-05-22T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/multiple_encryption/</id>
		<content type="html">&lt;p&gt;As I mentioned in my &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal&quot;&gt;post&lt;/a&gt; on EU&#39;s
proposed CSAM regulation, any content filtering system has
to worry about nonconforming clients which are trying to
evade filtering. One obvious approach is to lie about message contents
or the output of filtering algorithms. Another method of
nonconformance that is often proposed is &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Multiple_encryption&amp;amp;oldid=1084918410&quot;&gt;multiple encryption&lt;/a&gt;,
in which you use an ordinary messaging system like WhatsApp or iMessage,
but before you send messages you first encrypt
them yourself, so that even if the main messaging system
were broken, your data would still be secure.&lt;/p&gt;
&lt;h2 id=&quot;why-not-just-use-a-different-system%3F&quot;&gt;Why not just use a different system? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/multiple_encryption/#why-not-just-use-a-different-system%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted in the Wikipedia page I linked to above, one reason
to do multiple encryption is just to provide defense in depth
in case the outer system is broken, but in this case,
we are &lt;em&gt;assuming&lt;/em&gt; that the outer system is broken because it
is subject to some detection/monitoring requirement, so it&#39;s
not adding much security value. It&#39;s not that hard to build
your own messaging system, so why not just use one that
isn&#39;t being monitored, for instance because it&#39;s too small
to be subject to regulations, is located outside of the
relevant jurisdiction, or has just decided not to comply?&lt;/p&gt;
&lt;p&gt;The most obvious reason for using a common system is to
conceal your activities: if most people use a messaging system
that is subject to monitoring and you choose to use one
that is not, that&#39;s a potential signal that you really
want to hide and so are worth investigating in some other
fashion.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/multiple_encryption/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This is especially true if you are using a program that is
explicitly associated with an activity that the authorities
want to investigate as with something like
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mujahedeen_Secrets&amp;amp;oldid=1072582166&quot;&gt;Mujahedeen Secrets&lt;/a&gt;. Moreover, if you have to run your
own messaging servers, then that&#39;s a point of attack, which
you don&#39;t have if you encrypt messages and just send them
over WhatsApp.&lt;/p&gt;
&lt;h2 id=&quot;detection-and-steganography&quot;&gt;Detection and Steganography &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/multiple_encryption/#detection-and-steganography&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One obvious problem with multiple encryption is that the messaging
system—which, recall, we assume is compromised—can just
change their filtering algorithms to detect your inner encrypted
messages and block or report them. How effective this is depends
on precisely how the monitoring is done. At a high level, there
are two main possibilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Targeted monitoring in which communications are generally not
monitored but the authorities can target specific people or messages
for monitoring. This is sometimes referred to as &amp;quot;exceptional
access&amp;quot;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Continuous monitoring in which much or all of the content is scanned
(this is what the EU regulation seems to contemplate).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In an exceptional access regime, because communications are generally
encrypted and therefore can&#39;t be routinely scanned, your use of multiple
encryption won&#39;t ordinarily be detected. Of course, if you are
one of the people who &lt;em&gt;is&lt;/em&gt; subject to surveillance, then that
will be detected, but then all that is revealed is that you are
using an inner layer of encryption, which may look suspicious,
but then you wouldn&#39;t (at least in theory) be subject to exceptional access unless
you were already suspected. It may even not result in your messages
being blocked because law enforcement and intelligence agencies
often want surveillance to be secret, and blocking your messages
would reveal that they had been decrypting them.&lt;/p&gt;
&lt;p&gt;By contrast, in a continuous monitoring regime, most if not all
messages will be scanned and so just encrypting will be easily detected
and can be blocked. This blocking doesn&#39;t reveal anything useful to the
people using inner encryption because the fact of monitoring isn&#39;t
a secret.&lt;/p&gt;
&lt;p&gt;This doesn&#39;t mean that it&#39;s not possible to multiply encrypt
in these situations, but it does mean that you have to do more
than just encrypt; you need to have the encrypted data look
like ordinary messages. There has been a fair amount of work
on what&#39;s called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Steganography&amp;amp;oldid=1086460245&quot;&gt;steganography&lt;/a&gt;, which involves hiding messages
in other messages. For instance, one might hide the true
message in the first word of each line, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://s.yimg.com/ny/api/res/1.2/4QNYgvOhEEMeo_hLrl.Sww--/YXBwaWQ9aGlnaGxhbmRlcjt3PTcwNTtjZj13ZWJw/https://s.yimg.com/dh/ap/default/140117/rickroll1.jpg&quot; alt=&quot;Rickroll&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: &lt;a href=&quot;https://news.yahoo.com/blogs/sideshow/student-pulls-of-rickroll-prank-in-physics-essay-143253131.html&quot;&gt;Yahoo News&lt;/a&gt;, original by Sairam Gudiseva]&lt;/p&gt;
&lt;p&gt;There are a lot of possible techniques here, such as hiding data
in the low order bits of images or audio files. In general, anywhere
that there is room for variation there is room to conceal data.
The rise of machine learning techniques for generating content
(e.g., &lt;a href=&quot;https://github.com/openai/gpt-3&quot;&gt;GPT-3&lt;/a&gt;) also makes it
easy to generate new plausible content which you can then hide
your message in, as opposed to requiring you to take some existing
content and tweak it (thus making it susceptible to detection based
on comparing it to the original template).&lt;/p&gt;
&lt;p&gt;Steganography has seen less work than other areas of communications
security, so if this kind of thing sees wide use it will probably
be a bit of an arms race for a while between concealment and
detection, but I would expect concealment to win most of the time,
just because there are is already so much natural variation in messages
and so many ways to conceal information. False positives are even
more of a problem here, because—unlike CSAM—it won&#39;t really be possible to manually
determine whether something is steganography or not and so you&#39;re
just left blocking a bunch of users.&lt;/p&gt;
&lt;h2 id=&quot;key-management&quot;&gt;Key Management &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/multiple_encryption/#key-management&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you&#39;re going to encrypt data, you need to have encryption keys
that aren&#39;t known to the attacker, otherwise they will just try to
decrypt everything that goes by with each key and see what works (this is known as &amp;quot;trial
decryption&amp;quot;). Naively, this involves setting up a whole new identity
system, as you&#39;re effectively running your own messaging system on top
of someone else&#39;s (see
&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#key-establishment-and-message-encryption&quot;&gt;here&lt;/a&gt;
for a bit on what this involves) which is really a pain, but actually
I think you could get a lot of value with much less.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;more-on-active-attacks&quot;&gt;More on Active Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/multiple_encryption/#more-on-active-attacks&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Suppose that the multiple encryption system works by embeddeding DH
keys in the low order bits of specific pixels in each image. When
Alice and Bob first exchange messages, an active attacker could just
stomp them with its own bits, which would result in either (1)
establishing a pair of keys with Alice and Bob (2) or establishing
what is &lt;em&gt;apparently&lt;/em&gt; a pair of shared keys but is actually nothing (we
could in principle have some kind of error check but obviously we
don&#39;t want to do that because it makes inner encryption easy to detect).
They then look for the first message that should be encrypted and
try to decrypt it: if it works, then multiple encryption was
probably in use; if it&#39;s garbage, then probably not.&lt;/p&gt;
&lt;p&gt;But now what happens if there is another kind of multiple encryption
which encodes a different kind of key in the same bits? The service can
only try one of these, and if they get it wrong, then people
can&#39;t establish keys, which they might notice, at which point
word gets out that they are mounting active attacks. Similarly, if
there is any method for double-checking the established keys
(e.g., something like Signal&#39;s &lt;a href=&quot;https://signal.org/blog/safety-number-updates/&quot;&gt;&amp;quot;safety numbers&amp;quot;&lt;/a&gt;) then this will be quickly detected.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea would be to just do &lt;em&gt;unauthenticated key establishment&lt;/em&gt; over the
existing messaging system. What this means is that you use the
same cryptographic protocols that you would use to set up keys
(e.g., Diffie-Hellman) but you don&#39;t bother to authenticate the other side. This is much
technically easier because you don&#39;t need an identity system at all;
you&#39;re just relying on the identities provided by the existing
messaging system you are running on top of
(another good reason to use an existing messaging system rather than
building your own). One could also imagine something intermediate where
people publish their keys on Facebook or Twitter.&lt;/p&gt;
&lt;p&gt;Of course, unauthenticated encryption leaves you open to &lt;em&gt;active attack&lt;/em&gt; by the messaging
system where it tries to establish its own keys with each side, but
this kind of attack is going to be a lot more work than just passively
monitoring each message, and they&#39;ll have to do it for every potential
kind of inner encryption and for every pair of users. Moreover, this inherently involves damaging
the messages, which is something that is likely to get noticed quite
quickly if anybody bothers to check. So, while you&#39;re potentially
vulnerable to a very dedicated attacker, in practice this would
give you a lot of security.&lt;/p&gt;
&lt;h2 id=&quot;one-versus-two-sided-systems&quot;&gt;One Versus Two-Sided Systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/multiple_encryption/#one-versus-two-sided-systems&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One very important limitation of multiple encryption systems like this
is that they only work when both sides participate:
each user needs to install some kind of new software that will
handle the multiple encryption, and if you are just running the
standard software, you&#39;ll either get something that looks like
random junk or like whatever innocuous cover traffic is being used to
hide the encrypted data in, depending on whether steganography
is in use. This means that multiple encryption can be used to evade filtering
in contexts like trading CSAM or buying drugs where (presumably)
both sides have an interest in concealment, but can&#39;t really be used to
evade filtering in cases like solicitation of minors because the minor isn&#39;t
going to have installed the new program (and of course the service
can fairly easily scan for a suggestion that they do so).&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Of course some people just like their privacy,
but the question is whether this is a useful signal
on average. &lt;a href=&quot;https://educatedguesswork.org/posts/multiple_encryption/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>End-to-End Encryption and the EU&#39;s new proposed CSAM Regulation</title>
		<link href="https://educatedguesswork.org/posts/eu-csam-proposal/"/>
		<updated>2022-05-19T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/eu-csam-proposal/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;Last week the European Commission published a new &lt;a href=&quot;https://www.europeansources.info/record/proposal-for-a-regulation-laying-down-rules-to-prevent-and-combat-child-sexual-abuse/&quot;&gt;&amp;quot;Proposal
for a Regulation laying down rules to prevent and combat child sexual
abuse&amp;quot;&lt;/a&gt;. This
regulation would require Internet communications platforms to take
various actions intended to prevent or at least reduce what it terms
&amp;quot;online sexual abuse&amp;quot;.&lt;/p&gt;
&lt;h2 id=&quot;proposal-summary&quot;&gt;Proposal Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#proposal-summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://ec.europa.eu/home-affairs/system/files/2022-05/Proposal%20for%20a%20Regulation%20laying%20down%20rules%20to%20prevent%20and%20combat%20child%20sexual%20abuse_en.pdf&quot;&gt;proposed regulation&lt;/a&gt;
runs to 135 pages and is somewhat light on detail, but here&#39;s a brief
summary of the most relevant points (with the disclaimer that I am not
a lawyer).&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Requires all &amp;quot;hosting services and providers of
interpersonal communications services&amp;quot; to perform a risk assessment of
the risk of use of their service (Article 3) for online sexual abuse
and to take &amp;quot;risk mitigation&amp;quot; measures (Article 4), said measures
being required to be &amp;quot;effective in mitigating the identified risk&amp;quot;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Allows the &amp;quot;Coordinating Authority&amp;quot; of a member
state to issue a &amp;quot;detection order&amp;quot; (Article 7) which would require the service
to set in place technical measures that are
&amp;quot;effective in detecting the dissemination of known or new child sexual abuse material or the solicitation of children, as applicable&amp;quot; (Article 10(3)(a)) based on indicators created by a
new EU Centre.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Creates a new EU Centre which will develop technologies for detecting
the above types of content and make them available to providers
as well as generating indicators of contraband content
(Article 44).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Impose various transparency and takedown requirements on providers, for
instance requiring them to block/takedown specific pieces of content.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It&#39;s a bit unclear to me what the line is being required to have
measures that are &amp;quot;effective in mitigating the identified risk&amp;quot; versus
&amp;quot;effective in detecting the dissemination of known or new child sexual
abuse material or the solicitation of children&amp;quot;, but I would expect
that any significant-sized service is likely to be served with a
detection order, given that the standard for issuing the orders, as
set out in Article 7 (4) is that &amp;quot;there is evidence of a significant
risk of the service being used for the purpose of online child sexual
abuse&amp;quot;, which is probably the case for any major service, just
because there is so much traffic; even if detection were perfect—which it isn&#39;t—there would always be new users wanting to exchange
prohibited material. For
that reason, it&#39;s probably most useful to focus on the implications
of the detection order requirement.&lt;/p&gt;
&lt;h2 id=&quot;technologies-for-detecting-online-sexual-abuse&quot;&gt;Technologies for Detecting Online Sexual Abuse &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#technologies-for-detecting-online-sexual-abuse&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This proposal is concerned with three main types of material:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Known &lt;em&gt;child sexual abuse material (CSAM)&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;New CSAM that hasn&#39;t before been seen.&lt;/li&gt;
&lt;li&gt;Solicitation of children&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The standard techniques for detecting known CSAM mostly depend on
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Perceptual_hashing&amp;amp;oldid=1086461935&quot;&gt;perceptual hashing&lt;/a&gt;,
in which we compute a short value that is characteristic of the image (or video).
You start with a database of known CSAM objects and compute their
perceptual hashes. The idea is supposed to be that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If two images look &amp;quot;the same&amp;quot; then they will have the same
hash, even if they are slightly different. For instance,
a color and black-and-white version of the same image.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If two images are &amp;quot;different&amp;quot; then they will have different
hashes with very high probability.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that this is different from cryptographic hashing because similar
looking images will have the same hash, whereas with a cryptographic
hash even a single bit difference should produce a new hash.
In order to scan a new piece of content you compute its hash and
then look up the hash in the table of known hashes. If there&#39;s
a match, then the content is potentially CSAM and you take
some action, such as alerting the authorities.
(see &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/&quot;&gt;here&lt;/a&gt;
for some limitations of this kind of system).&lt;/p&gt;
&lt;p&gt;Hashing doesn&#39;t work for unknown images, however, because
you won&#39;t have their hashes, and won&#39;t work for detecting text
messages and the like that are designed to solicit children.
The state of the art for detecting this kind of material is to
train machine learning models (&amp;quot;classifiers&amp;quot;)
that attempt to distinguish innocuous
material from contraband. This kind of technique is already in
wide use for spam filtering, but there are also technologies like
this that attempt to identify &lt;a href=&quot;https://www.thorn.org/blog/how-safers-detection-technology-stops-the-spread-of-csam/&quot;&gt;CSAM&lt;/a&gt;
and &lt;a href=&quot;https://blogs.microsoft.com/on-the-issues/2020/01/09/artemis-online-grooming-detection/&quot;&gt;solicitation&lt;/a&gt;;
as I understand it, these technologies are already in use
in some systems.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;traffic-encryption&quot;&gt;Traffic Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#traffic-encryption&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Most services do encrypt traffic, but often it&#39;s only in transit
between the client and the server, which doesn&#39;t prevent the
service from doing any analysis on it they want. You&#39;ll also
often hear that services store data encrypted, but that usually
just means it&#39;s encrypted with keys they know. This isn&#39;t
worthless: it migh protect you if someone steals one of their hard drives,
and depending on things are built might make certain forms of
inside attack difficult—for instance if administrators
can&#39;t get the keys—but
doesn&#39;t do anything to get in the way of the service itself
inspecting your data.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&#39;s important to recognize that these technologies require having
access to the &lt;em&gt;content&lt;/em&gt; itself, whether to compute the hash or to run
the classifier. If you have a system where the service sees the data
in plaintext, then this is straightforward, but if the data is
&lt;em&gt;end-to-end&lt;/em&gt; encrypted, meaning that that service doesn&#39;t see it, then
life gets more complicated, by which I mean &amp;quot;there isn&#39;t really
a good solution&amp;quot;.&lt;/p&gt;
&lt;h2 id=&quot;content-filtering-on-encrypted-data&quot;&gt;Content Filtering on Encrypted Data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#content-filtering-on-encrypted-data&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The obvious way to address the problem of content filtering on encrypted
data is just not to encrypt it, but of course this has a very negative
impact on the security of people&#39;s communications
(see my &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/&quot;&gt;previous post&lt;/a&gt; on E2EE and encrypted messaging
for more on this), and so there has been quite a bit of work on
content filtering with encrypted data. The EU proposal relies heavily on an
EU-sponsored Experts Report (see &lt;a href=&quot;https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=SWD:2022:209:FIN&amp;amp;from=EN&quot;&gt;Annex 9&lt;/a&gt; of their impact analysis) describing
their analysis of the situation and making some recommendations.
I&#39;ll address this report below, but at a high level, there
are two main approaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Filter on the &lt;em&gt;client&lt;/em&gt; and report results back to the server.&lt;/li&gt;
&lt;li&gt;Filter on the &lt;em&gt;server&lt;/em&gt; or some other central point.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, neither of these really works very well, for reasons
I&#39;ll go into below.&lt;/p&gt;
&lt;h3 id=&quot;client-side-filtering&quot;&gt;Client-Side Filtering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#client-side-filtering&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Aside from just not encrypting at all, the obvious solution is to have the
client filter the data; after all, it already has the plaintext. However,
there are a number of challenges to making client-side filtering work
in practice.&lt;/p&gt;
&lt;h4 id=&quot;algorithmic-secrecy&quot;&gt;Algorithmic Secrecy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#algorithmic-secrecy&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The first major challenge for client-side filtering is the desire to
keep the algorithms used to determine whether to flag a given piece of
content should be secret. For instance, many server-side filtering
systems use a perceptual hashing technology called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=PhotoDNA&amp;amp;oldid=1086803811&quot;&gt;PhotoDNA&lt;/a&gt;.
Although the &lt;a href=&quot;https://web.archive.org/web/20130921055218/http://www.microsoft.com/global/en-us/news/publishingimages/ImageGallery/Images/Infographics/PhotoDNA/flowchart_photodna_Web.jpg&quot;&gt;general
structure&lt;/a&gt;
of the algorithm is known, the precise details are secret. In
addition, the hashes themselves are secret.&lt;/p&gt;
&lt;p&gt;As far as I can tell, there are two major reasons for this secrecy.
The first is that it&#39;s intended to deter evasion. If you have the
hash algorithm and the list of hashes, then you can check for
yourself whether a given piece of content is on the list and either
avoid transmitting it or alter the content so that it has a
a different hash that&#39;s not on the list. Even if you just know the
hash algorithm and you have a piece of content that might be on
the list, you can easily alter the content so that it has a different
hash, thus reducing the chance of detection. Or, in the case
of a detector for solicitation, the client might warn the user
to cut off the conversation when the classifier score got too
high.&lt;/p&gt;
&lt;p&gt;If the algorithm is secret, it&#39;s harder to know if two slightly
different inputs will have the same hash (recall that the idea of a
perceptual hash is that visually similar inputs produce the same
hash), but if you know the algorithm, it&#39;s trivial.  It&#39;s also
possible to go in the other direction, where you generate a piece of
innocuous content that matches a hash and send it to someone to
&amp;quot;frame&amp;quot; them. This is much easier if you know the hash.&lt;/p&gt;
&lt;p&gt;The second reason is that it might be possible to use the hashes
themselves to &lt;a href=&quot;https://towardsdatascience.com/black-box-attacks-on-perceptual-image-hashes-with-gans-cc1be11f277&quot;&gt;reconstruct&lt;/a&gt;
a low-res version of the original image, which would obviously
be undesirable, as it would mean that distributing the hash
database was kind of like distributing a low-fi version of
the original images with an unusual compression format.&lt;/p&gt;
&lt;p&gt;Apple&#39;s proposed client-side CSAM scanning system (see my writeup &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro&quot;&gt;here&lt;/a&gt;)
partly addresses these issues by using advanced cryptographic techniques
to conceal the hash list from the client. Briefly, the way this
works is that the service provides the client with an encrypted
copy of the hash database. The client computes a &amp;quot;voucher&amp;quot; based
on the content and the hash database, and sends it to the service,
but the service can only decrypt the voucher if the content matched
one of the hashes. This prevents the client from knowing whether
their content matched a hash but actually requires the client
software to know the hash algorithm, which they have to be able to compute locally&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
so it would still be possible for an attacker to
change content so it has a different hash.&lt;/p&gt;
&lt;p&gt;Moreover, Apple&#39;s system only works for &lt;em&gt;known&lt;/em&gt; hashes, and it&#39;s not
known how to extend it to the problem of having a client-side
classifier that is itself secret (unlike NeuralHash). As we&#39;ll
see later in this document, the need to run arbitrary
computation rather than just hash matching makes this whole problem space a lot harder.
It&#39;s maybe possible you could use some kind of encrypted computation solution
in which some server ran a classifier on an encrypted copy of the
content and then told the client whether it was contraband, but then
we&#39;d have the problem that the client could use the server
as an oracle for whether a given piece of content was OK,
which, as noted above, is undesirable.&lt;/p&gt;
&lt;h4 id=&quot;client-nonconformance&quot;&gt;Client Nonconformance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#client-nonconformance&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The other major problem with executing the classifier on the client
is that there&#39;s nothing requiring the client to actually run the
classifier on the true input, or on any input at all. For example,
in the Apple system, the client sends an (image, voucher) pair
up to iCloud but there&#39;s nothing in the system that forces the image to
match the voucher.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Instead, the client can just compute a voucher on an innocuous image
(in the Apple system, it can actually just produce a random
voucher, but one might imagine a different design where that
was not possible) and upload that voucher along with the image.&lt;/p&gt;
&lt;p&gt;The major barrier to this kind of attack is how inconvenient it
is for the user—who recall, is the attacker in this system—to
run a nonconformant client.
Of course, if you&#39;re using an iOS device, then you&#39;re running
Apple&#39;s software, which is designed to behave correctly,
and it&#39;s a pain to replace it with your own
(though &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=IOS_jailbreaking&amp;amp;oldid=1087694184&quot;&gt;nothing like impossible&lt;/a&gt;),
and in any case, this isn&#39;t a generic solution to the problem of
tens to hundreds of apps, including those which run on systems
much less locked down than iOS (including MacOS). This problem
is much worse for &amp;quot;open&amp;quot; systems in which the protocols are
public or in which the clients are open source
because in those systems anyone can build their own client
that interoperates with the system but doesn&#39;t correctly
run the classifier (i.e., it lies!), which makes the system
far less useful. Of course, some people will still use the default
client, but in many of the scenarios of interest, people
&lt;strong&gt;know that they are sending contraband&lt;/strong&gt; and so will be willing
to use custom tools that evade filtering, in which case
almost any system other than having the client send the data
in the clear won&#39;t work.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;server-side-filtering&quot;&gt;Server-Side Filtering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#server-side-filtering&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The other set of the designs use a server for filtering (&amp;quot;don&#39;t encrypt&amp;quot;
is the trivial version of this). Similarly, you could send a copy
of the data (or, in the hash version of the system, a copy of the
hash) to some &amp;quot;trusted&amp;quot; server which does the filtering. The nominal advantage
of such a design is that the service provider (e.g., WhatsApp) can&#39;t
see your data (or the hash) but of course this third party would
and it&#39;s not clear how that&#39;s better, as it comes down to trusting
some server operated by someone you don&#39;t know not to spy on you.&lt;/p&gt;
&lt;p&gt;The EU Experts Report&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
proposes two fancy cryptographic mechanisms for
addressing this problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Having the client upload encrypted hashes and use &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Secure_multi-party_computation&amp;amp;oldid=1079707423&quot;&gt;multiparty computation (MPC)&lt;/a&gt; to determine whether one of the hashes matches.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Using &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Homomorphic_encryption&amp;amp;oldid=1085790826&quot;&gt;fully homomorphic encryption (FHE)&lt;/a&gt; to compute the perceptual hash over the content and determine if it matches the hash list.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As far as I can tell, the encrypted hash/MPC design is inferior to Apple&#39;s proposal in that it&#39;s more complicated and still only does hashes.
The EU report frames the FHE system as being about hashes, but if it works at all, I think it&#39;s likely to work with classifiers too, because it involves the server
running am arbitrary computation. With that said, it&#39;s also not clear to me how it&#39;s intended to work. Here&#39;s the diagram from their report:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/eu-fhe.png&quot; alt=&quot;FHE filtering&quot; /&gt;&lt;/p&gt;
&lt;p&gt;FHE is a bit outside my main area of expertise, but I&#39;m having trouble making
sense of this. The point of homomorphic encryption is that you can perform
a computation on encrypted data. In the typical FHE setting, the client encrypts the data and sends
it to the server which operates on the encrypted data and returns the result,
as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/fhe.png&quot; alt=&quot;FHE Example&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;partially-homomorphic-encryption&quot;&gt;Partially Homomorphic Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#partially-homomorphic-encryption&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s been known for a very long time how to do &lt;em&gt;partially&lt;/em&gt; homomorphic encryption.
As a concrete example, consider the case where you encrypt some data by XORing
it with a key, i.e.,&lt;/p&gt;
&lt;p&gt;$$Ciphertext = Plaintext &#92;oplus Key$$&lt;/p&gt;
&lt;p&gt;With this system, you can have the server compute the XOR of two plaintexts,
$P_1$ and $P_2$
The client sends:&lt;/p&gt;
&lt;p&gt;$$ (C_1, C_2) = (P_1 &#92;oplus K_1, P2_2 &#92;oplus K_2)$$&lt;/p&gt;
&lt;p&gt;The server returns:&lt;/p&gt;
&lt;p&gt;$$ C1 &#92;oplus C_2 $$&lt;/p&gt;
&lt;p&gt;Which the client XORs with $K_1 &#92;oplus K_2$, i.e.,&lt;/p&gt;
&lt;p&gt;$$P_1 &#92;oplus K_1 &#92;oplus P2_2 &#92;oplus K_2 &#92;oplus K1 &#92;oplus K_2 $$&lt;/p&gt;
&lt;p&gt;When you cancel out the keys ($A &#92;oplus A = 0$) you get:&lt;/p&gt;
&lt;p&gt;$$ P_1 &#92;oplus P_2$$&lt;/p&gt;
&lt;p&gt;The difference between &lt;em&gt;partially&lt;/em&gt; and &lt;em&gt;fully&lt;/em&gt; homomorphic encryption is that with
a partial homomorphic system you can compute some functions on encrypted data
but not others. With a fully homomorphic system you can compute any function,
whereas this system is homomorphic with respect to XOR but not (say) to multiplication.
The problem of &lt;em&gt;fully&lt;/em&gt; homomorphic encryption had been open for a long time
until Craig Gentry finally showed how to do it in 2009.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The idea here is that the client has some input that it wants some
expensive computation done on. It could just run the computation in
some cloud service like AWS but it doesn&#39;t want the cloud service to
see the data. Instead, encrypts the data and sends the encrypted
version to the server. The server then performs the computation on the
encrypted data, but without seeing the data (ordinarily this would not
be possible but there is some extremely fancy math involved). The computation
is structured so that the server doesn&#39;t get to see the result but just
an encrypted version of the result, which it sends back to the client.
The client then decrypts the result and learns the answer.&lt;/p&gt;
&lt;p&gt;What makes this use of of FHE weird is that the response doesn&#39;t go
back to the client but rather the &lt;em&gt;server&lt;/em&gt; somehow sees an
&lt;em&gt;encrypted&lt;/em&gt; hash that it compares with a list of other &lt;em&gt;encrypted&lt;/em&gt;
hashes, which doesn&#39;t seem to be the customary FHE setting.
It&#39;s possible I&#39;m missing something, but as described, it seems
like this design would allow the server to learn the actual
content, not just whether it matches a given hash. The issue is
that the server determines the algorithm that it runs on the
encrypted data, and so it can design an algorithm that allows it
to extract the data. For instance, suppose you have an algorithm
that looks at a single pixel of an image and emits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The hash of a known piece of CSAM if the image is black.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;A random value if the image is white.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You then run the algorithm in sequence over each pixel of the image
at a time and you&#39;ve extracted the content (assuming it&#39;s black and
white).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
You could obviously extend this technique to be more efficient,
or to work on text, etc.&lt;/p&gt;
&lt;p&gt;It&#39;s possible that the design of the system might be
able to &lt;em&gt;somehow&lt;/em&gt; restrict the algorithms that the server
can run—though usually homomorphic encryption does so
at a lower level, like that you can only multiply but not
add—but that restriction would have to be enforced by
having the client encode data in a certain way, such that it
was just partially homomorphic. This seems impractically
inflexible, especially in light of the fact that we don&#39;t just want
the server to compute perceptual hashes but to run generic
classifiers, which tend to be fairly complicated systems,
and that they are supposed to be based on whatever indicators are provided
by the EU Centre.
Restricting the classifier algorithm by controlling the inputs
seems even more problematic
if you want to keep it secret from the client, which, as noted
above, is important for preventing evasion; if the client
wants to evade and knows that only certain classifiers can
be run, it can tune its content to evade those classifiers.&lt;/p&gt;
&lt;p&gt;You could of course build a more traditional FHE-style system
in which the server just told the client whether the content
had been flagged, and count on the client to report the user.
However, with that design, you&#39;re telling the user whether
they have been flagged, which, as above, is undesirable,
and you still have to worry about client
nonconformance (i.e., just ignoring that the user was flagged).
If the response is encrypted, then the server has no way
of knowing that the client is behaving correctly.&lt;/p&gt;
&lt;p&gt;I should also mention at this point that even the piece where
you build the classifier using homomorphic encryption is kind
of a research problem, as stated in the report:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Another possible encryption related solution would be to use machine
learning and build classifiers to apply on homomorphically encrypted
data for instant classification. Microsoft has been doing research on
this but the solution is still far from being functional.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The bottom line here is that I don&#39;t think we&#39;re at the point
where fancy crypto is going to help. Even if it&#39;s possible
in principle to build something that allows the server
just to tell if something is contraband without seeing the
content (which is far from clear), it&#39;s not practical do
do so with our current cryptographic tools.&lt;/p&gt;
&lt;h3 id=&quot;trusted-execution-environments&quot;&gt;Trusted Execution Environments &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#trusted-execution-environments&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One approach that has recently become popular for dealing with this kind
of complicated trust problem—especially when it feels too hard for crypto—is to use what&#39;s called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Trusted_execution_environment&amp;amp;oldid=1083726151&quot;&gt;&lt;em&gt;Trusted Execution Environment (TEE)&lt;/em&gt;&lt;/a&gt; or an &amp;quot;enclave&amp;quot;. A TEE
is a processor feature that allows the operator of the processor
to run computations on data without being able to see the data.&lt;/p&gt;
&lt;p&gt;The basic way a TEE works is that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The processor manufacturer installs a signing key when
the processor is manufactured. This key is signed by the
manufacturer&#39;s key.&lt;/li&gt;
&lt;li&gt;The TEE internally generates a secret encryption key
pair.&lt;/li&gt;
&lt;li&gt;The operator installs a program onto the TEE.&lt;/li&gt;
&lt;li&gt;The TEE then signs a statement (using the signing key)
that &lt;em&gt;attests&lt;/em&gt; to the program and to the public half
of the encryption key pair.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The operator can then send this statement to someone else
who knows that (1) they are interacting with the TEE rather than
with the operator and (2) precisely what program the operator
is running on the TEE. That someone else verifies the signature
chain and compares the program to its expectations.&lt;/p&gt;
&lt;p&gt;It&#39;s easy to see why a TEE is attractive, as in theory it ought to
offer a generic solution to a huge number of privacy and security
problems: there&#39;s no fancy crypto to be concerned with, you just write
your program to do whatever you want and shove it in the TEE. You
do have to be a little (well, more than a little)
careful to write the program on the TEE
so it doesn&#39;t leak information about the data its operating on
via side channels and the like (remember what I said about
the difficulty of &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#safe-computation-on-secret-data-is-hard&quot;&gt;safely computing on secret data&lt;/a&gt;),
but one might hope that that&#39;s a problem that could be solved
with the right programming practices and then you just have a magic
box that securely executes any program you want.&lt;/p&gt;
&lt;p&gt;Given such a box, the problem becomes a lot easier. For instance,
the EU report suggests that the client send the encrypted
messages to the TEE &lt;em&gt;along with the encryption keys&lt;/em&gt; , which would run whatever
filtering algorithms were needed on it and then either forward
the encrypted message (if it was OK) or would report
a violation (if it was not). You could also use the TEE to
run filtering on the client because you could run the classifier
secretly in the TEE without disclosing it to the user.
(You won&#39;t be surprised to hear that one of the big uses of
TEEs is for &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Digital_rights_management&amp;amp;oldid=1085790220&quot;&gt;DRM&lt;/a&gt;
for media.) Running a secret classifier is somewhat tricky,
but you might imagine a system in which the classifier was
revealed to some set of experts who would then attest that
it was OK and publish a hash of it that clients could check.&lt;/p&gt;
&lt;p&gt;There&#39;s just one tiny problem: TEEs are a lot less secure than one
would actually like. There is a whole line of papers
attacking the best-known TEE, Intel SGX
(see &lt;a href=&quot;https://arxiv.org/pdf/2006.13598.pdf&quot;&gt;here&lt;/a&gt; for a survey).
Moreover, these attacks are all based on running code on the
processor, which is a fairly weak form of attack. However, they
generally don&#39;t provide defenses against physical attacks in which
someone who has physical control, in part because this is hard
to do in processor-sized package.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
For instance &lt;a href=&quot;https://www.intel.com/content/www/us/en/architecture-and-technology/software-guard-extensions-enhanced-data-protection.html&quot;&gt;here&#39;s what Intel says&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Side-channel attacks are based on using information such as power states, emissions and wait times directly from the processor to indirectly infer data use patterns. These attacks are very complex and difficult to execute, potentially requiring breaches of a company’s data center at multiple levels: physical, network and system.&lt;/p&gt;
&lt;p&gt;Hackers typically follow the path of least resistance. Today, that usually means attacking software. While Intel® SGX is not specifically designed to protect against side channel attacks, it provides a form of isolation for code and data that significantly raises the bar for attackers. Intel continues to work diligently with our customers and the research community to identify potential side-channel risks and mitigate them. Despite the existence of side-channel vulnerabilities, Intel® SGX remains a valuable tool because it offers a powerful additional layer of protection.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The problem here is that this is very high value data and so you
have to worry about very motivated attackers. For instance, in
the server-side TEE system described in the EU report, the TEE
would effectively have access to the plaintext of everyone&#39;s messages,
which means that any effective attack on the TEE breaks E2EE and
enables universal surveillance by the server. Given
the history of successful attack on systems like this, assuming
that it cannot be broken even given the resources of a
state-level adversary who wants to read everyone&#39;s communications seems
unreasonably optimistic.&lt;/p&gt;
&lt;p&gt;Finally, the whole security of a TEE system relies on the processor
manufacturer not cheating, but those processor manufacturers are
big companies, so users also have to worry about the manufacturers
being compelled to assist in surveillance, for instance by signing
a processor key for a processor which didn&#39;t actually provide the
TEE security functions.&lt;/p&gt;
&lt;h2 id=&quot;algorithms-and-systems-design&quot;&gt;Algorithms and Systems Design &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#algorithms-and-systems-design&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even if we ignore the security pieces, this is still a hard problem.
Although automated content scanning is widely employed, these systems
routinely misclassify data, which is why you still get spam messages
in your mailbox even with best-in-class spam filters and why any
big content system has to employ—or more likely subcontract—an
&lt;a href=&quot;https://www.nytimes.com/2021/08/31/technology/facebook-accenture-content-moderation.html&quot;&gt;army of humans&lt;/a&gt;
to manually go through stuff that&#39;s been flagged by their algorithms.
How well these algorithms work seems to vary a fair bit depending
on what they are asked to do, and the EU impact analysis is
fairly light on details:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Thorn’s CSAM Classifier can be set at a 99.9% precision rate. With
that precision rate, 99.9% of the content that the classifier
identifies as CSAM is CSAM, and it identifies 80% of the total CSAM
in the data set. With this precision rate, only .1% of the content
flagged as CSAM will end up being non-CSAM. These metrics are very
likely to improve with increased utilization and feedback.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This 99.9% number is reported as &amp;quot;Data from bench tests&amp;quot;. Thorn
itself reports a 99% number, but doesn&#39;t provide details of how
the tests are conducted.&lt;/p&gt;
&lt;p&gt;By contrast, the problem of classifying &amp;quot;solicitation&amp;quot; seems to be much
harder. The EU references some
&lt;a href=&quot;https://blogs.microsoft.com/on-the-issues/2020/01/09/artemis-online-grooming-detection/&quot;&gt;work&lt;/a&gt;
by Microsoft and says &amp;quot;Microsoft has reported that, in its own
deployment of this tool in its services, its accuracy is 88%.&amp;quot;.&lt;/p&gt;
&lt;h3 id=&quot;reporting-test-accuracy&quot;&gt;Reporting Test Accuracy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#reporting-test-accuracy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I just want to take a moment here to complain about the way these
numbers are being reported, which is really confusing. Any given
test has two types of errors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;false positives&lt;/em&gt; in which you report a positive test (in this
case a violation) when there is none.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;false negatives&lt;/em&gt; in which you report a negative test
(in this case no violation) when there is one&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The typical way to report these is just like that. I.e., the false
positive rate is the fraction of positives you would get if you
performed tests on inputs which were truly negative. For example
the &lt;a href=&quot;https://ihealthlabs.com/pages/ihealth-covid-19-antigen-rapid-test-details&quot;&gt;iHealth COVID test&lt;/a&gt;
&amp;quot;correctly identified 94.3% of positive specimens and 98.1% of negative specimens&amp;quot;,
which means that if you are negative, there is a 1.9% chance the test will
report positive.&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that this is number is &lt;em&gt;different&lt;/em&gt; from
the fraction of positives which are actually negative, because that
number depends on the population you are testing. For example, if
you went back in time and administered COVID tests to people in 2010,
then &lt;em&gt;every&lt;/em&gt; positive test would be a false positive because nobody
had COVID. The lesson here is that the use of a test is dependent
on the properties of the population in which its being used; even a
very accurate test can have a lot of false positives—to the point where most
of the positives will actually be false positives—if the
number of true positives is very low
(see Schneier on the &lt;a href=&quot;https://www.schneier.com/blog/archives/2006/07/terrorists_data.html&quot;&gt;base rate fallacy&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Conversely, it&#39;s not possible to determine the accuracy of a test
by reporting the fraction of errors  without knowing the
sample it was tested on. For instance, I could have a CSAM filter
test that just reported &amp;quot;is CSAM&amp;quot; for everything and if I tested
it only only CSAM inputs, it would look to be 100% accurate,
even though it&#39;s obviously useless. So in this case, that 99.9%
number on bench tests is useless without knowing the set of inputs
it was tried on. The 88% number is even worse because &amp;quot;accuracy&amp;quot;
could mean anything, and I wasn&#39;t able to find anywhere where
Microsoft reported their own research.&lt;/p&gt;
&lt;p&gt;Without this kind of information we can&#39;t tell how effective
a system like this will be. Only a tiny fraction of the content
on the Internet is CSAM or solicitation, and so even a
very accurate filter is still going to produce a large number
of false positives. Knowing about how many there will be is
critical to understanding the practical effectiveness of this
kind of system.&lt;/p&gt;
&lt;h3 id=&quot;manual-review&quot;&gt;Manual Review &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#manual-review&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As noted above, the possibility of false positives usually means that
you need manual filtering as a backup. In a system without end-to-end
encryption, this is straightforward: you already have the data because
you ran a filter on it, so you just send a copy to whoever is doing
the double checking.&lt;/p&gt;
&lt;p&gt;If the data is end-to-end encrypted, however, the problem becomes much harder,
because—with the exception of the TEE-type systems, which have other
problems—the server doesn&#39;t have the data in the clear, so it needs
to obtain either the encryption keys for the content or the content
itself. The Apple system solves this problem automatically
but as I mentioned above, it only works for hash matching, not for
general classification algorithms. Of course, if the classifier
shows a positive result, the server can always ask the client to
send a copy of the plaintext, but then this isn&#39;t secret from the
client, and of course a nonconforming client might lie about the
content, so this doesn&#39;t seem like a great solution.&lt;/p&gt;
&lt;h2 id=&quot;policy-implications-for-end-to-end-encryption&quot;&gt;Policy Implications for End-to-End Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#policy-implications-for-end-to-end-encryption&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Both the proposal and the public communications around it have been
fairly vague about the implication for end-to-end encryption, instead
&lt;a href=&quot;https://techcrunch.com/2022/05/11/eu-csam-detection-plan/&quot;&gt;framing&lt;/a&gt;
this as a &amp;quot;technology neutral&amp;quot; set of regulations:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Assuming the Commission proposal gets adopted (and the European
Parliament and Council have to weigh in before that can happen), one
major question for the EU is absolutely what happens if/when services
ordered to carry out detection of CSAM are using end-to-end
encryption — meaning they are not in a position to scan message
content to detect CSAM/potential grooming in progress since they do
not hold keys to decrypt the data.&lt;/p&gt;
&lt;p&gt;Johansson was asked about encryption during today’s presser — and
specifically whether the regulation poses the risk of backdooring
encryption? She sought to close down the concern but the
Commission’s circuitous logic on this topic makes that task perhaps
as difficult as inventing a perfectly effective and privacy safe
CSAM detecting technology.&lt;/p&gt;
&lt;p&gt;“I know there are rumors on my proposal but this is not a proposal
on encryption. This is a proposal on child sexual abuse material,”
she responded. “CSAM is always illegal in the European Union, no
matter the context it is in. [The proposal is] only about detecting
CSAM — it’s not about reading or communication or anything. It’s
just about finding this specific illegal content, report it and to
remove it. And it has to be done with technologies that have been
consulted with data protection authorities. It has to be with the
least privacy intrusive technology.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;However, for the reasons discussed above, designing
a communications system that combines end-to-end encryption
with robust content filtering is basically an open research
question.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
This is not to say that it&#39;s not something that can never be
solved, but rather that it&#39;s not something we know how to do
today, even at the level of &amp;quot;we have a prototype
that just needs to be tech transferred&amp;quot;. Whatever the intent,
it&#39;s hard to see how a mandate of this form that applies
to all platforms isn&#39;t effectively a prohibition on end-to-end encryption.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Apple didn&#39;t publish NeuralHash but it was
quickly &lt;a href=&quot;https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX&quot;&gt;reverse engineered&lt;/a&gt;
and published and people started demonstrating the kind of attacks I mention above. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that Apple doesn&#39;t presently E2E encrypt data in iCloud,
so presumably they could check that the voucher matches,
but the whole point of this system is to ensure that they
don&#39;t need to scan the image, so we should model the problem
as if the images were encrypted. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For instance, the use of Tor
Hidden Services to distribute CSAM in the &lt;a href=&quot;https://www.eff.org/pages/playpen-cases-frequently-asked-questions#whathappened&quot;&gt;2014 &amp;quot;Playpen&amp;quot; case&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Oddly, I couldn&#39;t find an author list, so I don&#39;t know which experts. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This requires the server to know such a hash, but these hashes
are fairly widely known, so shouldn&#39;t be an obstacle to a
state-level attacker. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;You may recall this technique from its appearance in
my post on &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/&quot;&gt;side channels&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You can purchase &amp;quot;hardware security modules&amp;quot; which aren&#39;t
just part of the processor but rather are a separate computer
in a tamper-resistant casing (the &lt;a href=&quot;https://en.wikipedia.org/wiki/IBM_4758&quot;&gt;IBM 4758&lt;/a&gt;
is an early example.). These do better at resisting physical
attack but are a lot less convenient to use, due to limited
processing power and large size. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that the EU report&#39;s recommendations implicitly concede this:
&amp;quot;Immediate: on-device hashing with server side matching (1b). Use
a hashing algorithm other than PhotoDNA to not compromise it. If
partial hashing is confirmed as not reversible, add that for
improved security (1c).&amp;quot; They recommend further research on
the other avenues. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-csam-proposal/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding The Web Security Model, Part V: Side Channels</title>
		<link href="https://educatedguesswork.org/posts/web-security-model-side-channels/"/>
		<updated>2022-05-09T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web-security-model-side-channels/</id>
		<content type="html">&lt;p&gt;This is part IV of my series on the Web security model (parts
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising&quot;&gt;outtake&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin&quot;&gt;III&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors&quot;&gt;IV&lt;/a&gt;).
In this post, I cover data leaks via side channels.&lt;/p&gt;
&lt;p&gt;Recall the
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#the-web-security-guarantee&quot;&gt;discussion&lt;/a&gt;
from part III about the basic guarantee of the Web security model,
which is that it is safe to visit even malicious sites.
As discussed in that post, the browser enforces a set
of rules that are designed to provide that guarantee. It&#39;s of course
possible to have vulnerabilities in the browser which allow
the attacker to bypass those rules; for instance, there might
be a &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/&quot;&gt;memory issue&lt;/a&gt; that allows the attacker
to subvert the browsers, at which point it can read the data directly.
However, there is another important class of issue that has long
been a problem in the Web, which is &amp;quot;side channel attacks&amp;quot;.&lt;/p&gt;
&lt;p&gt;Colloquially a &lt;em&gt;side channel&lt;/em&gt; is a mechanism that isn&#39;t part of the specified API
surface but which can be used to leak information.
In a side channel attack, the program can be behaving
correctly but there is some unintended observable behavior that allows
an attacker to learn secret information it should not have.
Historically, side channel attacks in browsers have had two main targets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User browsing history (i.e., what sites the user has visited),
in violation of the browser&#39;s basic privacy guarantees.&lt;/li&gt;
&lt;li&gt;Data from other sites, in violation of the same origin policy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As we&#39;ll see below, side channel attacks can be very hard to find and eliminate.&lt;/p&gt;
&lt;h2 id=&quot;a-simple-timing-channel&quot;&gt;A Simple Timing Channel &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#a-simple-timing-channel&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The general structure of most side channel attacks is that the
there is some secret data that the attacker can&#39;t see directly
but the attacker is able to observe some computation on the secret and use that
information to learn about the secret.
Consider, for example, the following code to check the
correctness of a password.&lt;/p&gt;
&lt;pre class=&quot;language-c&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;bool &lt;span class=&quot;token function&quot;&gt;checkPassword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;userPassword&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;actualPassword&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token class-name&quot;&gt;size_t&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// If the ith character doesn&#39;t match, then&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// return false.&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;userPassword&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; actualPassword&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; false&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;        &lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// If the ith character is &#39;&#92;0&#39;, then we are at&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// the end of the string and they match, so&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token comment&quot;&gt;// return true.&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;actualPassword&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token char&quot;&gt;&#39;&#92;0&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// userPassword must also be `&#92;0` or we would&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token comment&quot;&gt;// have returned false above.&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; true&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;      &lt;br /&gt;      i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The logic of this code is that you go through the both passwords
one character at a time and if there is a mismatch at any
character, we return false. It takes advantage of the fact
that C strings don&#39;t have an attached length but instead
use a character with value &lt;code&gt;&#92;0&lt;/code&gt; to indicate the
end of the string.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The explicit
API of this function is that it just tells you whether
a given password is valid or not. If you wanted to use
this API to guess the user&#39;s password, you would in
principle just have to check every password one at
a time, and you don&#39;t learn any information unless you
guess exactly the right password.
If you assume 8 character passwords with only letters and
numbers, then there are 62 possible values for each position
and there are 62^8 possible (about 2^{48}) possible passwords.
If you just try them one at a time, you&#39;ll find the right
password about halfway through on average, so that is 2^{47}
attempts, which will take quite some time (though is
also practical on modern computers, which is why people
tell you to use &lt;a href=&quot;https://educatedguesswork.org/posts/passwords1/&quot;&gt;longer passwords&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Unfortunately, this function leaks more information
than just that explicitly provided by the API.
The problem is that this code is not &lt;em&gt;constant time&lt;/em&gt;:
because it checks the characters one at a time, the time
to run the function depends on the number of characters
that match. This means that if an attacker can very precisely
measure the running time of the &lt;code&gt;checkPassword()&lt;/code&gt; function,
they can learn information not provided by the API, namely
the &lt;em&gt;first character which doesn&#39;t match&lt;/em&gt;.&lt;/p&gt;
&lt;!-- Statistical removal --&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;lockpicking&quot;&gt;Lockpicking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#lockpicking&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Lockpicking exploits the same basic intuition about combinatorics.
Your typical lock has a set of pins which prevent the lock
barrel from turning. When you insert the key into the
lock, the key pushes the pins up, as shown in the picture below. If the part of they
aligned with given pin is the right height, it will push the
pin up the correct amount, so it no longer blocks the lock
barrel. If you get all the pins right, you can turn the
key and the lock opens.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Pin_tumbler_with_key.svg/1200px-Pin_tumbler_with_key.svg.png&quot; alt=&quot;Wikipedia lock picture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Source: Wikipedia]&lt;/p&gt;
&lt;p&gt;This would all be fine if the lock were perfect, because you&#39;d
have to get the key completely right and so
you would have to try every possible
combination in sequence, but in practice there&#39;s always some individual variation:
When you try to turn the lock barrel, one pin will usually
be the one that prevents it from turning (&amp;quot;binding&amp;quot;).
You can exploit this by apply torque to the lock barrel and then
using a tool to push up each pin in sequence. If you push
up the pin that is binding the right amount (so that the
break in the pin aligns with the lock barrel) the lock
barrel will turn slightly until it binds on the next
pin. You can repeat this process until you have all the pins
and the lock opens.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you just think naively about this function, you would
expect its running time to be proportional to the number
of matching characters. For instance, if it takes one
nanosecond to check each character, then if the first
three characters match, the function will take 4ns
(to check the first three and then reject on the fourth).
Note that real processors are much more complicated,
as we&#39;ll see later in this post.
You can use this fact to attack passwords very quickly.
The basic idea is that you just generate a random
candidate password and measure the running time of
the function. If, for instance, it is 1ns, then
this tells you that the first character is wrong.&lt;/p&gt;
&lt;p&gt;You can then just iterate through all of the possible values
for character 1 until the function runs in more than
1ns. When that happens, you know you have the first
character right. You then keep the first character
constant and iterate through the second character, and
so on until you have broken the entire password.
On average, you&#39;ll get the right character for each
position about halfway through and so the total
attack time is something like 31 * 8 (~250) attempts,
which is obviously much faster than 2&lt;sup&gt;47&lt;/sup&gt; attempts!&lt;/p&gt;
&lt;p&gt;Of course, the signal here is very small: modern processors
are very fast and there are other things happening on the
computer besides just your task, so small timing differences
can be hard to measure. However, there are now a more or less
standard set of techniques for making this kind of attack
work better. First, you can run the measurement a lot of times,
which helps separate out the signal from the noise. Second,
you can find ways to &lt;em&gt;amplify&lt;/em&gt; the signal so that the slower
operation gets a lot slower. We&#39;ll see an example of this
below.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;cross-site-state&quot;&gt;Cross-Site State &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#cross-site-state&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first class of side channel attacks I want to talk about
take advantage of browser features which share state across
sites. Consider the situation where site A wants to know whether the user has
visited site B. For obvious reasons, this is sensitive information: we
don&#39;t want arbitrary attackers to be able to see your browsing
history. Thus, the Web platform doesn&#39;t allow sites to
ask directly about browsing history, but
that doesn&#39;t mean it can&#39;t get the answer indirectly.&lt;/p&gt;
&lt;p&gt;The simplest mechanism is via the browser &lt;em&gt;cache&lt;/em&gt;. As I discussed
&lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/&quot;&gt;last week&lt;/a&gt;, performance is
a very high priority for Web sites, and downloading big files from
a remote site takes time and bandwidth. One way to address this problem
is for the browser to cache data from the server. When the browser
first downloads a resource, it stores it locally and can just reuse
the local copy rather than the one retrieved from the server.&lt;/p&gt;
&lt;p&gt;The actual details of HTTP caching are quite complicated because
sometimes the cached value will be usable, but sometimes the server
will change the resource and the client has to re-retrieve it.  Under
some conditions the client has to contact the server and ask if the
resource has changed, e.g., via the
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since&quot;&gt;&lt;code&gt;If-Modified-Since&lt;/code&gt;&lt;/a&gt;
header, and in others the server can just say &lt;a href=&quot;https://hacks.mozilla.org/2017/01/using-immutable-caching-to-speed-up-the-web/&quot;&gt;this resource will
never
change&lt;/a&gt;.
The common thread, however, is that getting resources from the local cache
is (&lt;a href=&quot;https://simonhearne.com/2020/network-faster-than-cache/&quot;&gt;hopefully&lt;/a&gt;)
faster than retrieving files from the server. That&#39;s the point of
caching but when combined with the fact that it&#39;s possible to measure
the load time for a cross-site resource, it gives us a timing leak.&lt;/p&gt;
&lt;p&gt;The basic idea here is really simple: suppose that &lt;code&gt;attacker.com&lt;/code&gt;
wants to know if you have gone to &lt;code&gt;example.com&lt;/code&gt;. It adds a large
resource from &lt;code&gt;example.com&lt;/code&gt; to its own site and measures how long
it takes to load. If the load is fast, then it is likely that the data
is in cache, which suggests that the user has been to
&lt;code&gt;example.com&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
This attack has &lt;a href=&quot;https://collaborate.princeton.edu/en/publications/timing-attacks-on-web-privacy&quot;&gt;been known at least since a 2000 paper by Felten and Schneider&lt;/a&gt;, and turns out to be part of a giant class of such
issues, with browser state targets including: HTTP connections, DNS caching, TLS session
IDs, HSTS state, etc. The general problem is that any time there
is state that is shared between site A and B, activity on
A can potentially affect behavior on B.
The right
&lt;a href=&quot;https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.215.6662&amp;amp;rep=rep1&amp;amp;type=pdf&quot;&gt;solution&lt;/a&gt;
was published by Jackson, Bortz, Boneh, and Mitchell in 2006:
partition client-side state by the top-level origin as well
as by the origin. For instance, in this case resources loaded
from &lt;code&gt;example.com&lt;/code&gt; would be in a different cache from
those loaded from &lt;code&gt;attacker.com&lt;/code&gt;, which means that when &lt;code&gt;attacker.com&lt;/code&gt;
goes to load the test resource it will have to retrieve it
separately, even if &lt;code&gt;example.com&lt;/code&gt; has already done so.&lt;/p&gt;
&lt;p&gt;At this point you might ask why this hasn&#39;t been fixed. As far
as I can tell, there are three main reasons: (1) the widespread
use of cookie-based tracking made fixing these slower attacks
less interesting (2) it&#39;s actually fairly complicated to address
everything, in part because some of the required changes do change
the observable behavior of Web browsers (3) there were concerns
about the performance impact of reducing the effectiveness of
caching. However, as Web privacy has become a bigger issue, browsers
have started making a serious effort to address this class of
attack, mostly via &lt;a href=&quot;https://privacycg.github.io/storage-partitioning/&quot;&gt;work&lt;/a&gt;
in the W3C Privacy Community Group. This &lt;a href=&quot;https://docs.google.com/presentation/d/1i7KvTtIS2JhAadQsdWLFpMzNmgXmUbXSfPuO_wYX6d8/edit#slide=id.g1135ef95135_0_110&quot;&gt;presentation&lt;/a&gt; by Anne van Kesteren
does a good job of describing the situation.&lt;/p&gt;
&lt;h2 id=&quot;computation-on-secret-data&quot;&gt;Computation on Secret Data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#computation-on-secret-data&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As discussed in &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin&quot;&gt;part III&lt;/a&gt;,
the same origin policy allows cross-origin use of data (for instance, embedding an image from
another site) but forbids access to the data. In addition, it allows you to
operate that data in a variety of ways that are intended to be safe because
they don&#39;t allow you to see the result. It should surprise nobody to
learn that these aren&#39;t actually safe.&lt;/p&gt;
&lt;h3 id=&quot;link-decoration&quot;&gt;Link Decoration &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#link-decoration&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Let&#39;s warm up with a simple example: Link coloring.
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/CSS&quot;&gt;CSS&lt;/a&gt; allows
Web paged to apply &lt;em&gt;styles&lt;/em&gt; (e.g., colors, underlining, etc.)
depending on whether they have been visited or not. This helps
the user know whether they need to click on a link or not.
For instance, this fragment of CSS will turn all links
red except those you have visited, which are blue.&lt;/p&gt;
&lt;pre class=&quot;language-css&quot;&gt;&lt;code class=&quot;language-css&quot;&gt;&lt;span class=&quot;token selector&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; red&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token selector&quot;&gt;a:visited&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; blue&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;br /&gt;
&lt;h4 id=&quot;the-basic-attack&quot;&gt;The Basic Attack &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#the-basic-attack&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;This would all be fine except that it turns out that the Web &lt;em&gt;also&lt;/em&gt;
lets you inspect the color of elements in the DOM using
the &lt;code&gt;getComputedStyle()&lt;/code&gt; function. In 2002,
Andrew Clover &lt;a href=&quot;https://seclists.org/bugtraq/2002/Feb/271&quot;&gt;observed&lt;/a&gt;
that this combination of
features creates a trivial
attack in which the attacker puts a bunch of links on their
page to sites they think you might have visited and then inspects
the color to see which ones you actually have visited. Obviously, the
attacker only gets to learn about pages it actually knows about,
so in some ways this isn&#39;t as good as cookie-based tracking,
but the attacker can send you a very big page with a lot of links,
so it can extract quite a bit of information.
Moreover, unlike cookie-based tracking, this attack can be used
to learn whether you have visited sites which aren&#39;t cooperating
with the attacker, such as their competitors!&lt;/p&gt;
&lt;p&gt;This isn&#39;t just a theoretical issue. In 2010, &lt;a href=&quot;https://hovav.net/ucsd/papers/jjls10.html&quot;&gt;Jang,
Jhala, Lerner, and Shacham&lt;/a&gt;
scanned the Alexa top 50,000 sites and discovered a number
of sites doing history sniffing including two companies which
that provided it as a service. For example, they found that
the popular adult site Youporn used history sniffing to discover
whether people were visiting their competitor Pornhub and
third party ads on a number of sites checked to see if users had gone
to various car-related sites.&lt;/p&gt;
&lt;p&gt;The basic URL color attack is now fixed, though only as of
about 2010. The fixes turn out to be fairly complicated, as
described by David Baron in this &lt;a href=&quot;https://dbaron.org/mozilla/visited-privacy&quot;&gt;post&lt;/a&gt;
describing the fixes deployed in Firefox. The basic defense
is to have the browser lie about various CSS selectors
that let you query whether links were visited, by acting
as if they were limited. However, this isn&#39;t enough because
there are other CSS mechanisms that would let you (for instance)
perturb the layout of the page and thus observe whether it reflowed.
The complete fix requires also limiting the style
changes that CSS can apply based on whether a link is visited
to those which (hopefully) do not leak information.
Other browsers have followed suit, in part due to &lt;a href=&quot;https://petsymposium.org/2012/papers/hotpets12-9-ftc.pdf&quot;&gt;pressure
from the US Federal Trade Commission&lt;/a&gt; after Jang et al. published their work.&lt;/p&gt;
&lt;h4 id=&quot;side-channels&quot;&gt;Side Channels &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#side-channels&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Arguably this isn&#39;t even a side channel attack,
because we&#39;re using an official API: the problem is just
an unexpected result of combining two APIs. So, when
we remove those APIs the problem will be solved, right?
Of course not. Even without these APIs, the same data
turns out to be accessible via a number of side channels.
Many of these side channels work by observing that if
you change the appearance of a link, this can cause
the page to be repainted, which can be detected by the
attacker&#39;s script.
This means that is you have a link which is unvisited
and then change the URL to be one that is visited, it
causes a repaint, allowing you differentiate visited
from unvisited links.
Initially, the repaint was &lt;em&gt;directly&lt;/em&gt; measurable in Firefox
with the &lt;code&gt;mozAfterPaint&lt;/code&gt; event, but it was
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=600025&quot;&gt;later&lt;/a&gt;
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=608030&quot;&gt;removed&lt;/a&gt;
to avoid exactly this kind of link.&lt;/p&gt;
&lt;p&gt;However, even without an explicit signal, it&#39;s
still possible to detect repaints, as described
by &lt;a href=&quot;https://doczz.net/doc/8769089/pixel-perfect-timing-attacks-with-html5&quot;&gt;Paul Stone&lt;/a&gt;.
Normally repainting is fast, but if you can make
the repaint slower, then you can measure it. The trick
is to apply some CSS effects to the link (e.g., drop shadows)
that take time to compute. These effects aren&#39;t conditional
on whether the link is visited, so they are allowed, but
are slow to compute, thus allowing the attacker to measure
the time taken to repaint. You can make things even slower by
including multiple copies of the same link, thus
making the attack work better even with fast browsers.&lt;/p&gt;
&lt;p&gt;The hits just keep coming. In 2019, Smith et al. &lt;a href=&quot;https://cseweb.ucsd.edu/~dstefan/pubs/smith:2018:browser.pdf&quot;&gt;published&lt;/a&gt;
three new side channel attacks on browser history via
link styling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Via the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/CSS_Painting_API&quot;&gt;CSS Paint API&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Via &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/CSS/transform&quot;&gt;CSS transforms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Via &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG&quot;&gt;SVG&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, the CSS Paint API allows you to register a JavaScript
&amp;quot;paintlet&amp;quot; which can draw the background image for a given element,
like a link. If you change the foreground element in certain
ways—including changing the color—then this requires the
paintlet to be re-run. The paintlet runs in a little sandbox
that can&#39;t talk to the outside world, so you shouldn&#39;t be able to directly
tell if it ran, but it turns out that you can measure how long
it takes to run, using code like the following (adapted from Smith
et al.)&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; target &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; document&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getElementById&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;target&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; start &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; performance&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;         &lt;span class=&quot;token comment&quot;&gt;// Get the current time&lt;/span&gt;&lt;br /&gt;target&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;href &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://example.com/&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; delta &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; performance&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; start&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Get the time after the change&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;delta &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; threshold&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token function&quot;&gt;alert&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Victim visited https://example.com/&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What makes this work is that when you change the DOM using JavaScript,
those changes happen &lt;em&gt;synchronously&lt;/em&gt;: the line of code changing the
&lt;code&gt;href&lt;/code&gt; field &lt;em&gt;blocks&lt;/em&gt; until the DOM has changed, and the next
line only executes after the change has happened. In this case,
if the link has been visited, then the repaint has to happen which
takes more time, and so you can measure it using this code.&lt;/p&gt;
&lt;p&gt;Because browsers are quite fast, it would ordinarily
be fairly difficult to measure the time difference, but Smith et al.
observe that it&#39;s possible to deliberately make the paintlet slow
by adding a loop in the paintlet code that takes extra time, which
makes the difference easier to measure.
This is a fairly simple technique for amplifying the size of a timing
signal, and in some cases you need something fancier. For instance,
later in this paper, Smith et al. describe a technique (due initially
to Stone) in which they
rapidly change a link back and forth (as above), thus forcing
the browser to do a lot of computation, and measure the frame
rate of the browser&#39;s renderer. Ordinarily the browser would
render about 60 frames per second, but if you give it too
much work to do, it will fall behind and this is detectable
from JS.&lt;/p&gt;
&lt;p&gt;Of course browsers fixed these issues (and the
CSS paint issue only happened in Chrome because other
browsers hadn&#39;t implemented CSS Paint, and Chrome eventually
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/PaintWorklet&quot;&gt;disabled&lt;/a&gt;
CSS Paint for links). However, we still see new attacks
on link history, such as &lt;a href=&quot;https://www.mozilla.org/en-US/security/advisories/mfsa2022-16/#CVE-2022-29916&quot;&gt;CVE-2022-29916&lt;/a&gt;,
fixed in the recently released Firefox 100, just as I was working on this
post.&lt;/p&gt;
&lt;h3 id=&quot;pixel-stealing&quot;&gt;Pixel Stealing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#pixel-stealing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Another example of the risks of allowing sites to compute on data from
other origins is what&#39;s known as &amp;quot;pixel stealing&amp;quot; attacks. Recall that
it&#39;s possible for site A to embed content from site B (e.g., in an IFRAME
or an &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag), but it&#39;s not allowed to inspect the content.
However, site A &lt;em&gt;is&lt;/em&gt; allowed to apply &lt;em&gt;filters&lt;/em&gt; to that content
to change its appearance; they just can&#39;t see the output of the
filters. If this sounds like bad news, you&#39;re developing the right
intuition.&lt;/p&gt;
&lt;p&gt;A good example of what can go wrong here is provided by
Paul Stone in the same &lt;a href=&quot;https://doczz.net/doc/8769089/pixel-perfect-timing-attacks-with-html5&quot;&gt;white paper&lt;/a&gt;
where he disclosed timing-based measurements.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
The basic idea is that you
design an &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG/Element/filter&quot;&gt;SVG Filter&lt;/a&gt;
which runs at different speeds on black and white pixels (based on the the
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/SVG/Element/feMorphology&quot;&gt;feMorphology&lt;/a&gt;
primitive).
You load the target content in an IFRAME&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
and then apply the filter to one pixel at a time, measuring the time
it takes to run (as before, we can use a bunch of techniques
like running the filter a lot of times and magnifying the
image so that each pixel is actually a lot of pixels in
order to make the time difference bigger). This lets you extract the
contents of the image one pixel at a time. Obviously, this isn&#39;t
super efficient, but as Stone observes, if you want to read text
out of a page, then you don&#39;t need that many bits because you
only need to read some of the pixels to distinguish characters.&lt;/p&gt;
&lt;p&gt;After these reports, browsers responded to these bug reports by rewriting the primitives
in question so that they were closer to constant time—or by moving them
to the graphics processor, where it was hoped they would be more constant
time (though &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=711043#c52&quot;&gt;see here&lt;/a&gt;)—but it shouldn&#39;t
surprise you that these are not the only cases where attackers can
compute on cross-origin content with data-dependent results. A great
example of this is a 2015 &lt;a href=&quot;http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1068.1276&amp;amp;rep=rep1&amp;amp;type=pdf&quot;&gt;paper&lt;/a&gt;
by Andrysco, Kohlbrenner, Mowery, Jhala, Lerner, and Shacham
describing how to resurrect the SVG filter technique using a new
timing channel based on floating point numbers.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
If nothing else, this serves as evidence of how difficult it is to remove this
kind of timing channel.&lt;/p&gt;
&lt;h2 id=&quot;input&quot;&gt;Input &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#input&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The final class of attack I want to discuss are on user input. The basic
observation here is that when people are typing into the browser or
moving the mouse, this takes time to process, which temporarily
stalls the processor. If you set up a loop in which you ask the
browser to increment a counter very frequently, and measure the
actual rate at which the timer increments, you find that it
increments slightly more slowly during periods where the user
has typed a keystroke, as shown in the following image from
a 2017 paper by &lt;a href=&quot;https://attacking.systems/web/files/keystroke_js.pdf&quot;&gt;Lipp et al.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/keystroke-timing.png&quot; alt=&quot;Keystroke Timing&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Just knowing when someone is typing doesn&#39;t seem that useful, but
it turns out that by measuring the time &lt;em&gt;between&lt;/em&gt; keystrokes, it
is possible to learn a fair amount of information about what people
are typing. The basic intuition is that people don&#39;t type at
a constant rate and that different key combinations take
longer time (consider the case where there are two keys typed
with the same finger). This kind of problem has received a fair amount of
study: in their original paper, Lipp et al. show how to determine
with some confidence which URL people are typing; in 2001,
&lt;a href=&quot;https://www.usenix.org/publications/library/proceedings/sec01/song.html&quot;&gt;Song et al.&lt;/a&gt;
showed that it was possible to narrow down the range of user
passwords in SSH from network traces; and there have been
several papers about using accelerometers to measure typing
on &lt;a href=&quot;https://www.usenix.org/legacy/events/hotsec11/tech/final_files/Cai.pdf&quot;&gt;mobile phones&lt;/a&gt;
or on adjacent keyboards using a mobile phone.&lt;/p&gt;
&lt;p&gt;Because there is a a lot of redundancy in
the characters people type (for instance, in English, the
&amp;quot;q&amp;quot; is generally followed by &amp;quot;u&amp;quot; and not by &amp;quot;x&amp;quot;),
some character combinations are more likely than others.
This makes it possible to train a machine learning model that
estimates which characters are being typed based on the
available timing. The results aren&#39;t amazing, with accuracy
rates in the 70-80% range, but they&#39;re a lot better than
chance, and as &lt;a href=&quot;https://www.schneier.com/&quot;&gt;Bruce Schneier&lt;/a&gt;
observes, attacks only get better.&lt;/p&gt;
&lt;p&gt;One very interesting thing about this class of attacks is that
they aren&#39;t the result of deliberate browser decisions to mix
data across origins. Rather, they&#39;re the natural result of
some quite reasonable implementation decisions about how
to share computing resources between sites. This is bad news
because fixing them requires a lot of rethinking of the
design of the browser.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Side channel attacks in browsers are a big topic, but a few
common themes recur throughout the discussion.&lt;/p&gt;
&lt;h3 id=&quot;state-needs-to-be-partitioned&quot;&gt;State needs to be partitioned &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#state-needs-to-be-partitioned&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The main source of the various history sniffing attacks is that
there is some piece of state (e.g., cached data, history) that is
shared between site A and site B. As soon as you
are in this state, you&#39;re going to have side channels
and individually removing them is likely to be very
expensive. It&#39;s now been recognized that the basic fix is to
&lt;a href=&quot;https://privacycg.github.io/storage-partitioning/&quot;&gt;partition state&lt;/a&gt;
by the top-level site. Unfortunately,
there are a number of cases where this breaks functionality
that people are used to, which is part of why it&#39;s taken
so long to do. Moreover, as is the case with keystroke timing,
there turn out to be resources which are unintentionally
shared and hard to partition.&lt;/p&gt;
&lt;h3 id=&quot;safe-computation-on-secret-data-is-hard&quot;&gt;Safe computation on secret data is hard &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#safe-computation-on-secret-data-is-hard&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I mentioned early on in this series, one of the key properties
of the Web is the ability to make mash-ups of content from
your site and from other sites, while still having them
isolated by the same origin policy. However, the modern Web
includes a lot of features that allow you not only to
&lt;em&gt;incorporate&lt;/em&gt; content from other origins but to &lt;em&gt;compute&lt;/em&gt; on
it. This is a very powerful mechanism but is also incredibly
hard to do safely because that computation has to be done in
a way that it is identical no matter what the data being
computed on is. The lesson of the subnormal
floating point case is that this is extremely tricky to do and
depends on having very detailed knowledge of the processor
and the operating system, all of which might change in
some future version.&lt;/p&gt;
&lt;h3 id=&quot;high-resolution-timing-is-dangerous&quot;&gt;High resolution timing is dangerous &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#high-resolution-timing-is-dangerous&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A major building block of all of these attacks is the ability
to precisely measure the duration of events. The more precisely
you can measure events the smaller signals you can detect
and thus the more careful the implementation has to be to
suppress every difference between different code paths.
You can often improve attacks by amplifying one of the
code paths so that the timing difference is bigger and so
less precise timing works, but the consequence is that
attacks get slower and so it takes the attacker longer
to extract a given amount of information.&lt;/p&gt;
&lt;p&gt;There have been a number of attempts to provide systematic solutions
to the timing side channel problem, such as
&lt;a href=&quot;https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_kohlbrenner.pdf&quot;&gt;Fuzzyfox&lt;/a&gt;
by Kohlbrenner and Shacham. Techniques like this have the potential
to really improve resistance to side channel attacks, but at
a real performance cost and as far as I know no browser has
yet been willing to deploy them in production.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-systematic-solutions-and-microarchitectural-attacks&quot;&gt;Next Up: systematic solutions and microarchitectural attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#next-up%3A-systematic-solutions-and-microarchitectural-attacks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The history of side channel attacks in browsers—like many other
security stories—is one of repeated cycles
of attacks followed by ad hoc fixes for those specific attacks,
followed by new techniques that resurrect those attacks, which
themselves need to be fixed. The fundamental problem is that
the behavior of the browser is simply too complicated a system
to analyze with any confidence. The best known techniques for
preventing this kind of attack depend on simplifying the problem
so that security depends on a relatively small number of assumptions
that are easier to verify and enforce. This is where techniques
like partitioning come in.&lt;/p&gt;
&lt;p&gt;This point was driven home in 2018 when it was discovered that
a number of assumptions about the behavior of common processors
were wrong, leading to a series of side channel attacks based
on exploiting common processor optimizations. Defending
against these attacks
has forced browsers to make fundamental architectural changes.
Those attacks and the changes they required will be the topic of the next post
in this series.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Ordinarily, this feature, called &amp;quot;null termination&amp;quot;,
is considered a misfeature in C but in this case it&#39;s a bit convenient. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
A variant of this particular password checking bug was
responsible for one of the very earliest side
channel attacks, on the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=TENEX_%28operating_system%29&amp;amp;id=1079239940&amp;amp;wpFormIdentifier=titleform&quot;&gt;TENEX&lt;/a&gt;
system. The attack, &lt;a href=&quot;https://www.sjoerdlangkemper.nl/2016/11/01/tenex-password-bug/&quot;&gt;described in detail&lt;/a&gt;
by Sjoerd Langkemper, took advantage of the fact that
TENEX had virtual memory, in which the operating
system could &lt;em&gt;page out&lt;/em&gt; some data from memory to
the disk and then bring it back in when needed.
The attacker can exploit this bug by arranging the password
so it crosses a page boundary with the second
page having been paged out. The attacker can then
learn the first mismatching character by
observing whether the password check function
tried to touch a page which had been paged out and
needed to be paged back in (a &amp;quot;page fault&amp;quot;). &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note that this measurement itself loads the
data into cache, so repeated measurements will be fast, but the
attacker can set a cookie to detect this case or try loading multiple
resources. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Similar
attacks appear to have been discovered contemporaneously
by Kotcher, Pei, Jumde, and Jackson,
but their &lt;a href=&quot;https://dl.acm.org/doi/abs/10.1145/2508859.2516712&quot;&gt;paper&lt;/a&gt;
is behind a paywall, so this discussion focuses
on Stone&#39;s work. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;You can
also use this for link-based history sniffing, btw. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It turns out that some processors have multiple representations
for floating point numbers and that computations with
one such representation (&amp;quot;subnormal&amp;quot; or &amp;quot;denormal&amp;quot;)  are
slower than those with the regular representation.
The attack involves applying a filter that translates
black pixels into zero (which is normal) and non-black
pixels into a subnormal value. If you then
compute with the results, the non-black pixels are slower,
which gives you the signal you need. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-side-channels/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Challenges in Building a Decentralized Web</title>
		<link href="https://educatedguesswork.org/posts/challenges-web-decentralization/"/>
		<updated>2022-04-25T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/challenges-web-decentralization/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;There&#39;s been a lot of interest lately in what&#39;s often termed the
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Decentralized_web&amp;amp;oldid=1083941536&quot;&gt;Decentralized Web&lt;/a&gt; (dWeb),
though now it&#39;s quite common to hear the term &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Web3&amp;amp;oldid=1083462159&quot;&gt;Web3&lt;/a&gt;
used as well. Mapping out the precise distinctions between these terms—assuming that&#39;s
possible—is outside the scope of this post (though it seems that Web3 somehow
involves blockchains), but the common thread here seems to be replacing the existing
rather centralized Web ecosystem with one that is, well, less centralized.
This post looks at the challenges of actually building a system like this.&lt;/p&gt;
&lt;p&gt;The infrastructure of the Web is centralized in at least two major ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;There are relatively few major user-facing content distribution platforms
(Google, YouTube, Facebook, Twitter, TikTok, etc.) and they clearly have
outsized power over people&#39;s ability to get their message amplified.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Even if you&#39;re willing to forego posting on one of those content platforms,
the easiest way to build any large-scale system—and almost the only
economical way unless you are very well-funded—is to run it on
one of a relatively small number of infrastructure providers, such
as &lt;a href=&quot;https://aws.amazon.com/&quot;&gt;Amazon Web Services&lt;/a&gt;, &lt;a href=&quot;https://cloud.google.com/gcp/&quot;&gt;Google Cloud Platform&lt;/a&gt;,
&lt;a href=&quot;https://www.cloudflare.com/&quot;&gt;Cloudflare&lt;/a&gt;, &lt;a href=&quot;https://www.fastly.com/&quot;&gt;Fastly&lt;/a&gt;, etc.,
who already have highly scalable geographically distributed systems.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this context, decentralizing can mean anything from building
analogs to those specific content platforms that operate in a less centralized
fashion (e.g., &lt;a href=&quot;https://joinmastodon.org/&quot;&gt;Mastodon&lt;/a&gt; or
&lt;a href=&quot;https://diaspora.social/&quot;&gt;Diaspora&lt;/a&gt;) to rebuilding the entire
structure of the Web on a peer to peer platform like
&lt;a href=&quot;https://ipfs.io/&quot;&gt;IPFS&lt;/a&gt; or &lt;a href=&quot;https://beakerbrowser.com/&quot;&gt;Beaker&lt;/a&gt;.
Naturally, in the second case, you would also want to make it possible
to reproduce these content platforms—only better!—using
a mostly or fully peer-to-peer system; at least it shouldn&#39;t be
required to have a bunch of big servers somewhere to make it all work.
This second, more ambitious, project is the topic of this post.&lt;/p&gt;
&lt;h2 id=&quot;distributed-versus-decentralized&quot;&gt;Distributed Versus Decentralized &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#distributed-versus-decentralized&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;An important distinction to draw here is between systems which are &lt;em&gt;distributed&lt;/em&gt;
(also often called &lt;em&gt;federated&lt;/em&gt;) and those which are &lt;em&gt;decentralized&lt;/em&gt;
(often called &lt;em&gt;peer-to-peer&lt;/em&gt;). As an example, the Web is a distributed
system: it consists of lots of different sites operated by different
entities, but those sites run on servers and operating a site requires
running a server yourself or outsourcing that to someone else. Those servers have
to be prepared to handle the load for all your users, which means they
have to be somewhere with a lot of bandwidth, scale gracefully as more
users try to connect, etc.&lt;/p&gt;
&lt;p&gt;By contrast,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=BitTorrent&amp;amp;oldid=1083471967&quot;&gt;BitTorrent&lt;/a&gt;
is a decentralized system: it uses the resources of BitTorrent users
themselves to serve data, which means that you don&#39;t need a giant
server to publish data into the BitTorrent network, even if a lot of
other people want to download it. This has some obvious operational
advantages even in a world where bandwidth is cheap, but especially if
you want to publish something which others would prefer wasn&#39;t
published, perhaps because of government censorship or more frequently
for copyright reasons. If you run a server, it&#39;s pretty hard to
conceal that a million people just connected to download &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=John_Wick:_Chapter_3_%E2%80%93_Parabellum&amp;amp;oldid=1081953064&quot;&gt;John Wick:
Chapter 3 -
Parabellum&lt;/a&gt;
(a pretty solid outing by Keanu, btw), and you should expect the
copyright police to come after you (see here, &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Kim_Dotcom&amp;amp;oldid=1083570626&quot;&gt;Kim Dotcom&lt;/a&gt;)
but if you just publish your
copy into the BitTorrent network, it&#39;s a lot harder to figure out who
it was, especially if 50 other people did the same.&lt;/p&gt;
&lt;p&gt;Note that it&#39;s possible to have mixed systems that are largely decentralized
but depend on centralized components. For instance, in a peer-to-peer system,
new peers often need to connect to some &amp;quot;introduction server&amp;quot; to help them
join the network; those servers need to be easy to find and one—though
not the only way—to
do that is to have them be operated centrally.&lt;/p&gt;
&lt;p&gt;Historically, peer-to-peer systems have seen deployment in relatively
limited domains, mostly those associated with some kind of
deployment outside of the aforementioned censorship-resistance use case.
However, there has certainly been plenty of interest in broader use
cases, up to and including displacing large pieces of the Web.
This is a very difficult problem, in part because this kind of
system is inherently less efficient and flexible than a centralized or federated
system. This post looks at the challenges involved in building such a
system. This isn&#39;t to say it&#39;s not also challenging to
build something like Twitter or Facebook in a more federated fashion,
but the problems are of a different scale (and perhaps the subject of
a different post).&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;peer-to-peer-versus-client%2Fserver&quot;&gt;Peer-to-Peer versus Client/Server &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#peer-to-peer-versus-client%2Fserver&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The opposite of peer-to-peer is &lt;em&gt;client/server&lt;/em&gt;, i.e., a system in
which the elements take on asymmetrical roles, with one element (often that belonging to the
user&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;)
being the &amp;quot;client&amp;quot; and the other element (often some
kind of shared resource associated with an organization) being the &amp;quot;server&amp;quot;.
This is, for instance, how the Web works, with the client being the
browser. By contrast, peer-to-peer systems are thought of as
symmetrical.&lt;/p&gt;
&lt;p&gt;In practice, however, the lines can be quite blurry. For instance,
common to have systems in which the same protocols are used to talk
between clients and servers and also between servers, with the second
mode more like a typical &amp;quot;peer-to-peer&amp;quot; configuration. For instance,
mail clients use SMTP to send e-mail but mail servers also use SMTP
to send e-mail to each other, with the sender taking on the &amp;quot;client&amp;quot;
role; obviously in this case, each &amp;quot;server&amp;quot; is both client and server,
depending on which direction the mail is flowing. Even in systems
which are nominally peer-to-peer, it&#39;s common to use protocols which
were designed for client/server applications (e.g., &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8846.html&quot;&gt;TLS&lt;/a&gt;),
in which case the nodes may take on client/server roles for those protocol
purposes even if the application above is symmetrical.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;basics-of-peer-to-peer-systems&quot;&gt;Basics of Peer-to-Peer Systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#basics-of-peer-to-peer-systems&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We all (hopefully) know how a client/server publishing system like the
Web works (if not, review my &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/&quot;&gt;intro
post&lt;/a&gt;, but how does a peer-to-peer
(hence-forth P2P) publishing system work?  Let&#39;s start by discussing
the simplest case, which is just publishing opaque binary resources
(documents, movies, whatever). This section tries to describe
just enough basics of such a system to have the rest of this post make sense.&lt;/p&gt;
&lt;p&gt;In a client/server system, the resource to be published is stored
on the server, but in a P2P system, there are no servers, so the
resource is stored &amp;quot;in the network&amp;quot;. What this means operationally
is that it&#39;s stored on the computers of some subset of the users
who happen to be online at the moment. In order to make this work,
then, we need a set of rules (i.e., a protocol) that describes
which endpoints store a specific piece of content and how to find
them when you want to retrieve it. A common design here is what&#39;s
called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Distributed_hash_table&amp;amp;oldid=1076477001&quot;&gt;Distributed Hash Table&lt;/a&gt;, which is basically an abstraction in which every resource
has a &amp;quot;key&amp;quot; (i.e., an address) which is used to reference it and a &amp;quot;value&amp;quot; which is
its actual content. The key determines which node(s) are responsible
for storing the value and is used by other nodes to store and/or
retrieve it.&lt;/p&gt;
&lt;p&gt;As an intuition pump, consider the following toy DHT system. This is
an oversimplified version of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Chord_(peer-to-peer)&amp;amp;oldid=1082459600&quot;&gt;Chord&lt;/a&gt;,
one of the first DHTs, so let&#39;s call it &amp;quot;Note&amp;quot;. In Note, every
node in the system has a randomly generated identifier which is
just a number from $0$ to $2^{256}-1$ (sorry for the LaTeX notation,
newsletter folks). It&#39;s conventional to think of these being
organized in a circle, with the ids being assigned clockwise,
so that node $2^{256}-1$ is right next to (before) node $0$,
as shown in the following diagram:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/note-dht.drawio.png&quot; alt=&quot;note DHT ring&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Each node in the network (the &amp;quot;ring&amp;quot;) maintains a set of
connections to some other set of nodes in the ring
(the arrows are colored according to the node maintaining
the connection). I won&#39;t
go into detail about the algorithms here, except to say that
having that work efficiently is a lot of the science of making a DHT.
In Note, we&#39;ll just assume that each node has a connection to the next
node (i.e., the one with the next highest identity) and to
some other nodes further along the ring, as shown in the
figure above.&lt;/p&gt;
&lt;p&gt;In order to communicate with a node with id $i$,
a node sends a message to the node that it is connected
to with id $j$ that is closest to but not greater
than $i$ (i.e., that if you went around the circle
clockwise, there would be no node that you were
connected to that was in between them). Node $i$ does
the same. When you finally reach a node that is connected
directly to $j$, it delivers the message.
For instance, if node &lt;strong&gt;0&lt;/strong&gt; wanted to send a message to node
&lt;strong&gt;c&lt;/strong&gt; it would send it to &lt;strong&gt;b&lt;/strong&gt; who would send it to &lt;strong&gt;c&lt;/strong&gt;.
When &lt;strong&gt;c&lt;/strong&gt; wants to reply, it sends it to node &lt;strong&gt;e&lt;/strong&gt; which
is connected to node &lt;strong&gt;0&lt;/strong&gt; and so sends it directly.
Note that this means that a request/response
pair takes an entire trip around the ring.&lt;/p&gt;
&lt;h3 id=&quot;storing-data&quot;&gt;Storing Data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#storing-data&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;So far we just have a communications system, but it&#39;s (relatively)
easy to turn it into a storage system: we give each piece of
data an address in the same namespace as the node identifiers and
each node is responsible for storing any data with an address that
falls between it and the previous node. So, for instance, in the
diagram below, node &lt;strong&gt;c&lt;/strong&gt; would be responsible for storing
the resource with address &lt;strong&gt;k&lt;/strong&gt; and node &lt;strong&gt;e&lt;/strong&gt; would be responsible
for storing the resource with address &lt;strong&gt;l&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/note-dht-storage.drawio.png&quot; alt=&quot;note DHT ring&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If node &lt;strong&gt;a&lt;/strong&gt; wants to store a value with address &lt;strong&gt;k&lt;/strong&gt;
it would craft a message to &lt;strong&gt;c&lt;/strong&gt; asking to store it. Similarly,
if node &lt;strong&gt;d&lt;/strong&gt; wants to retrieve it, it would send a message to &lt;strong&gt;c&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Of course, there are several obvious problems here. First, what
happens if node &lt;strong&gt;c&lt;/strong&gt; drops off the network? After all, it&#39;s somebody&#39;s
personal computer, so they might turn it off at any moment. The
natural answer to this is to &lt;em&gt;replicate&lt;/em&gt; the data to some other
set of nodes so that there is a suitably low probability that
they will all go offline at once. The precise replication strategy
is also a complicated topic that varies depending on the DHT, and we don&#39;t need to go into it here.&lt;/p&gt;
&lt;p&gt;Second, what if some value is both large and popular? In that case,
the node(s) storing it might suddenly have to transfer a lot of
data all at once. It&#39;s easy for this to totally saturate someone&#39;s
link, even if they have a fast Internet connection. The only real
fix is to distribute the load, which you can do in two ways.
First, you can shard the resource (e.g., break up your movie into
5 minute chunks) and then store each shard under a different address;
this has the impact that different nodes will be responsible for
sending each chunk and so their share of the bandwidth is
correspondingly reduced. You can also try to make more nodes
responsible for popular content, which also spreads out the
load.&lt;/p&gt;
&lt;p&gt;Finally, if every message has to traverse several nodes in order
to be delivered, this increases the total load on the network
proportional to the path length (the number of nodes) as
well as decreasing performance due to latency. One way
to deal with that is to have the two communicating nodes establish
a direct connection for the bulk data transfer and just use the
DHT to get the in contact so they can do that. This significantly
reduces the overall load.&lt;/p&gt;
&lt;h3 id=&quot;naming-things&quot;&gt;Naming Things &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#naming-things&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In the previous description, I&#39;ve handwaved how the addresses
for things are derived.&lt;/p&gt;
&lt;p&gt;One common design is to compute the address
from the content of the object, for instance by hashing it. This is
what&#39;s called &lt;em&gt;Content Addressable Storage (CAS)&lt;/em&gt; and is convenient in
a number of situations because it doesn&#39;t require any additional
content integrity in the DHT. If you know the hash of the object you
can retrieve it and then if the hash comes out wrong, you know there
has been a problem retrieving it.&lt;/p&gt;
&lt;p&gt;Of course, given that you need the object in order to compute its
hash, this kind of design means that you need some service to map
objects whose names you know (e.g., &amp;quot;John Wick&amp;quot;) onto their
hashes, so now we either have a centralized service that does that or
we need to build a peer-to-peer version of that service and
we&#39;re back where we started.&lt;/p&gt;
&lt;p&gt;Another common approach is to have names that are derived from
cryptographic keys. For instance, we might say that all of my
data is stored at the hash of my public key (again, maybe with
some suitable sharding system). When the data gets stored we would
require it to be signed and nodes would discard stored values whose
signatures didn&#39;t validate. This has a number of advantages, but one
critical one is that you can have the data at a given address &lt;em&gt;change&lt;/em&gt;
because the address is tied to the cryptographic key not the content.
For instance, supposing that what&#39;s being stored is my Web site;
I might want to change that and not want to have to publish a new
address. With an address tied to keys this is possible.&lt;/p&gt;
&lt;p&gt;Obviously, cryptographic keys don&#39;t make great identifiers either, because
they are hard to remember, but presumably
you would layer some kind of decentralized naming layer on top,
for instance one based on a &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/&quot;&gt;blockchain&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;security&quot;&gt;Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#security&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Any real system needs some way of ensuring the integrity of the
content. Unlike the Web, it&#39;s not enough to establish a TLS connection to the storing
node, because that&#39;s just someone&#39;s computer and it could lie (though you
still may want to for privacy reasons).
Instead, each object needs to be somehow integrity protected,
either by having its address be its hash or by being digitally signed.&lt;/p&gt;
&lt;p&gt;Aside from the integrity of the content, there&#39;s still a lot to go wrong here. For instance,
what happens if the responsible node claims that a given object (or a
node you are trying to route to) doesn&#39;t exist? Or what if a set of
nodes try to saturate the network with traffic via a DDoS attack?
How do you deal with people trying to store or retrieve more than their
&amp;quot;fair share&amp;quot; (whatever that is) of data.
There are various approaches people have talked about to try to
address these issues, but our operational experience with DHTs is at a
smaller scale than our operational experience with the Web,
and in a setting that was much more tolerant of failure
(Disney doesn&#39;t lose a lot of money if people suddenly can&#39;t
download Frozen from BitTorrent)
and so it&#39;s not clear that they can be made to be really secure
at scale.&lt;/p&gt;
&lt;h2 id=&quot;a-decentralized-web-publishing-system&quot;&gt;A Decentralized Web Publishing System &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#a-decentralized-web-publishing-system&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Now that we have a way to store data and find it again, we have the
start of how one might imagine building a decentralized version of
the Web. As we did when &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/&quot;&gt;looking at how the Web works&lt;/a&gt; let&#39;s just
start with publishing static documents.&lt;/p&gt;
&lt;p&gt;Recall the structure of URIs:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/URL-structure.drawio.png&quot; alt=&quot;URL Structure&quot; /&gt;&lt;/p&gt;
&lt;p&gt;What we need to do is to map this structure onto resources in
our P2P storage system. So we might end up with a URL like
the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/URL-structure-note.drawio.png&quot; alt=&quot;URL Structure for a P2P system&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;the-origin&quot;&gt;The Origin &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#the-origin&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;A critical security requirement in this system is that
data associated with different authorities has different
origins (see &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/&quot;&gt;here&lt;/a&gt; for
background). If data published by multiple users has
&lt;strike&gt;different origins&lt;/strike&gt; the same origin [2022-04-25 -- EKR], then they could attack each other
via the browser, which is an obvious problem.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;note:&lt;/code&gt; at the start tells us that we need to retrieve
the data using Note and not via HTTP. In the middle
section, instead of having a &amp;quot;host&amp;quot; field which tells us where
to retrieve the content in an ordinary HTTPS URI, we instead
have an &amp;quot;authority&amp;quot; field which just tells us the identity
of the user whose key will be used to sign the data for the
URL. As above, I&#39;m assuming we have some way of mapping
user friendly identities to keys; some systems don&#39;t have that,
which seems pretty user-hostile, but feel free to just think of
the authority as being a key hash if you prefer.&lt;/p&gt;
&lt;p&gt;The resource itself is stored at an address given by &lt;code&gt;Hash(URL)&lt;/code&gt;
(this is a small but simple change from my description above),
and as above, is signed by key associated with the authority.&lt;/p&gt;
&lt;p&gt;This is all pretty straightforward if you assume the existence
of the P2P system in the first place. In order to publish
something, I do a store into the DHT at the address indicated
by the URL and sign it with my key. I can then hand the
URL to people who can retrieve the data from the DHT by
computing the address and then verifying the signed resource.
Note that because the address is computed from the URL and
not from the content, it can be updated in place just by
doing a new store.&lt;/p&gt;
&lt;p&gt;Taking a step back, this really does sort of deliver on the value
proposition I described above: anyone can publish a site into
the network without having to have a room full of computers
or pay Amazon/Google/Fastly, etc. And so if you don&#39;t look
too closely, it seems like mission accomplished and it&#39;s easy
to understand the enthusiasm. Unfortunately this system also has some pretty serious drawbacks.&lt;/p&gt;
&lt;h3 id=&quot;performance&quot;&gt;Performance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#performance&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Performance—in this case the time it takes a page to load—is
a major consideration for Web browsers and servers.
What mostly matters for Web performance is the time it takes to
retrieve each resource. This is different from, say, videoconferencing
or gaming, where latency (the time it takes your packets to
get to the other side) or jitter (variation in latency) really matter.
In the Web it&#39;s mostly about download speed.&lt;/p&gt;
&lt;h4 id=&quot;connections&quot;&gt;Connections &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#connections&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In order to understand the performance implications of a shift from
client/server to peer-to-peer it&#39;s necessary to understand a little
bit about how networking and data transfer works.  The Internet is a
&lt;em&gt;packet-switched&lt;/em&gt; network, which means that it carries individually
addressed messages that are on the order of 1000 bytes. Because Web
resources are generally larger than 1K, clients and servers transfer
data by establishing a &lt;em&gt;connection&lt;/em&gt;, which is a persistent association
on both sides that maps a set of packets into what looks like a stream
of data that each side can read and write to. The sender breaks the
file up into packets and sends them and the receiver is responsible
for reassembling them on receipt. Historically this was done by
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transmission_Control_Protocol&amp;amp;oldid=1083491738&quot;&gt;TCP&lt;/a&gt;,
though are now seeing increased use of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=QUIC&amp;amp;oldid=1083353797&quot;&gt;QUIC&lt;/a&gt;,
which operates on similar principles, at least at the level we need to
talk about here).&lt;/p&gt;
&lt;p&gt;The figure below shows the beginning of an HTTPS connection using TCP
and TLS 1.3 for security.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/https-hs.png&quot; alt=&quot;HTTPS Connection Ladder Diagram&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;increasing-the-number-of-http-requests-on-a-connection&quot;&gt;Increasing the number of HTTP Requests on a Connection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#increasing-the-number-of-http-requests-on-a-connection&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;When HTTP was originally designed, you could only have one
request on a single connection. This was horribly inefficient
for the reasons I&#39;ve described here, and—in
large part due to the work of &lt;a href=&quot;http://jmogul.com/jeff.html&quot;&gt;Jeff Mogul&lt;/a&gt;—a
feature was added that allowed multiple requests to be issued
on the same connection. Unfortunately, those requests could
only be issued serially, which created a new bottleneck. In
response, browsers started creating multiple connections
in parallel to the same site, which let them make multiple
requests at once (as well as sometimes grab a larger fraction
of the available bandwidth, due to TCP dynamics). In 2015,
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc7540.html&quot;&gt;HTTP/2&lt;/a&gt;
added the ability to multiplex multiple requests on the same
TCP connection, with the responses being interleaved, but
still had the problem that a packet lost for response A
stalled every other response (a property called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Head-of-line_blocking&amp;amp;oldid=1083849253&quot;&gt;head-of-line blocking&lt;/a&gt;),
which didn&#39;t happen between multiple connections.
Finally, &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc7540.html&quot;&gt;QUIC&lt;/a&gt;,
published in 2021, added multiplexing without head-of-line blocking,
even over a single QUIC connection.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As you can see, the first two round trips are entirely consumed with
setting up the connection. After two round trips, the client can
finally ask for the resource and it&#39;s another round trip before it
finally gets any data.  Depending on the network details, each round
trip can be anywhere from a few milliseconds to 200 milliseconds, so
it can be up to 600ms before the browser sees the first byte of
data. This is a big deal and over the past few years the IETF has
expended considerable effort to shave round trips from connection
setup time for the Web (with &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8446.html&quot;&gt;TLS
1.3&lt;/a&gt; and
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc9000.html&quot;&gt;QUIC&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Once the connection has been established, you then need to deliver
the data, which doesn&#39;t happen all at once. As I mentioned before,
it gets broken up into a stream of packets which are sent to the
other side over time. This is where things get a little bit tricky
because neither the sender nor the receiver knows the capacity
of the network (i.e., how many bits/second it can carry) and if
the sender tries to send too fast, then the extra packets get
dropped. To avoid this, TCP (or QUIC) tries
to work out a safe sending rate by gradually sending faster
and faster until there are signs of congestion (e.g., packets
getting lost or delayed) and then backs off. Importantly,
this means that initially you won&#39;t be using the full capacity
of the network until the connection warms up (this is called
&amp;quot;slow start&amp;quot;), so the data transfer rate tends to get faster over
time until a steady state is reached.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The implication of all this is that new connections are expensive
and you want to send as much data over a single connection
as you can. In fact, much of the evolution of HTTP over the
past 30 years has been finding ways to use fewer and fewer
connections for a single Web page.&lt;/p&gt;
&lt;h4 id=&quot;peer-to-peer-performance&quot;&gt;Peer-to-Peer Performance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#peer-to-peer-performance&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;This brings us to the question of performance in peer-to-peer
systems. As I mentioned above, if you want to move significant amounts
of data, you really want to have the client connect directly to the
node which is storing the data. This presents several problems.&lt;/p&gt;
&lt;p&gt;First, we have the latency involved in just sending the first message
through the P2P network and back. This will generally be slower than a
direct message because it can&#39;t take a direct path.  Then, it&#39;s not
generally possible to simply initiate a connection directly to other people&#39;s
personal computers, as they are often behind network elements like
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_address_translation&amp;amp;oldid=1083794290&quot;&gt;NATs&lt;/a&gt;
and
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Firewall_(computing)&amp;amp;oldid=1083940793&quot;&gt;Firewalls&lt;/a&gt;.
So-called &amp;quot;hole punching&amp;quot; protocols like
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Interactive_Connectivity_Establishment&amp;amp;oldid=1041588442&quot;&gt;ICE&lt;/a&gt;
allow you to establish direct connections in many cases, but they
introduce additional latency (minimum one round trip, but often much
more). And once that&#39;s done you then still have to establish
an encrypted connection, so we&#39;re talking anywhere upward from 2 additional
round trips.
To make matters worse, there will be many cases
where the storing node is quite topologically far from you and
therefore has a long round trip time; big sites and CDNs deliberately
locate points of presence close to users, but this is a much harder
problem with P2P systems.
And of course, even once the connection has been established, we&#39;re still
in slow start.&lt;/p&gt;
&lt;p&gt;This is all kind of a bad fit for Web sites, which tend to consist of
a lot of small files. For example, the Google home page, which is
generally designed to be lightweight, currently consists of 36 separate
resources, with the largest being 811 KB. If each of these resources
is stored separately in the DHT, then you&#39;re going to be running
the inefficient setup phase of the protocol a lot and will almost
never be in the efficient data transfer phase. This is by contrast
to HTTP and QUIC, which try to keep the connection to the server open so
that they can amortize out the startup phase.&lt;/p&gt;
&lt;p&gt;It&#39;s obviously possible to bundle up some of the resources on a site
into a single object, but this has other problems. First, it&#39;s hard
on the browser cache because many of those objects will be reused
on subsequent loads. Second, it makes the connection to a single
node the rate limiting step in the download, which is bad if that
node—which, recall, is just someone else&#39;s computer—doesn&#39;t
have a good network connection or is temporarily overloaded.
The result is that we have a tension between what we want to
minimize individual fetch latency, which is to
send everything over a single connection, and what we want to
do in order to avoid bottlenecking on single elements, which is
to download from a lot of servers at once, like BitTorrent does.&lt;/p&gt;
&lt;p&gt;All of this is less of an issue in contexts like movie downloading,
where the object is big and so overall throughput is more important
than latency. In that case, you can parallelize your connections
and keep the pipe full. However, this isn&#39;t the situation with
the Web, where people really notice page load time. As far as I know,
building a large P2P network with comparable load-time performance to
the Web is a mostly unsolved problem.&lt;/p&gt;
&lt;h3 id=&quot;security-and-privacy&quot;&gt;Security and Privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#security-and-privacy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Even if we assume that the P2P network itself is secure in the
sense that attackers can&#39;t bring it down and the
data is signed, this system still has some concerning properties.&lt;/p&gt;
&lt;h4 id=&quot;privacy&quot;&gt;Privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#privacy&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In any system like the Web, the node that serves data to the
client learns which data a given client is interested in,
at least to the level of the client&#39;s IP address. This isn&#39;t
an ideal situation in the current Web, hence IP address
concealment techniques like Tor, VPNs, Private Relay, etc.,
but at least it&#39;s &lt;em&gt;somewhat&lt;/em&gt; limited to identifiable entities
that you chose to interact with (though of course the ubiquitous
tracking in Web advertising makes the situation pretty bad).&lt;/p&gt;
&lt;p&gt;The situation with P2P systems is even worse: downloading
a piece of content means contacting a more or less random
computer on the Internet and telling it what you want. As
I noted above, you could route all the traffic through the
P2P network but only by seriously compromising privacy, so
realistically you&#39;re going to be sharing your IP address
with the node. Worse yet, in most cases the data is
going to be sharded over multiple nodes, which means that
a lot of different random people are seeing your browsing
behavior. Finally, in many networks it&#39;s possible for nodes
to influence which data they are responsible for, in which
which case one might imagine entities who wished to do
surveillance trying to become responsible for particular
kinds of sensitive data and then recording who came to retrieve it;
indeed, it &lt;a href=&quot;https://www.theregister.com/2013/08/20/ip_address_search_shows_prenda_copyright_trolls_seeded_smut_then_sued/&quot;&gt;appears&lt;/a&gt; this is already happening with BitTorrent.&lt;/p&gt;
&lt;h4 id=&quot;access-control%E2%80%94putting-the-public-in-publishing&quot;&gt;Access control—putting the public in publishing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#access-control%E2%80%94putting-the-public-in-publishing&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Much of the Web is available to everyone, but it&#39;s also quite
common to have situations in which you want to restrict access
to a piece of data. This can be the site&#39;s data, such as
the paywalls operated by sites like the New York Times, or
the user&#39;s data, such as with Facebook or Gmail. These
are implemented in the obvious way, by having an access
control list on the server which states which users can
access each piece of data and refusing to serve data to
unauthorized users. This won&#39;t work in a P2P system, however,
in that there&#39;s no server to do the enforcement: the data
is just stored on people&#39;s computers and even if the site
published access control rules, the site can&#39;t trust
the storing node to follow them. It might even be controlled
by the attacker.&lt;/p&gt;
&lt;p&gt;The traditional answer to this problem is to use to encrypt
the content before it&#39;s stored in the DHT. Even if the data
in the DHT is public, that&#39;s just the ciphertext.
This actually works modestly well when the content
is the user&#39;s and they don&#39;t want to share it with anyone
because they can encrypt it to a key they know
and then just store it in the DHT. This could even be done
with existing APIs (e.g., &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Web_Crypto_API&quot;&gt;WebCrypto&lt;/a&gt;), and the key is stored
on the user&#39;s computer. It works a lot less well if they
want to share it with other people—especially with
read/write applications like Google Docs—because you
need cryptographic enforcement mechanisms for all of
the access rules. There has been some real work on this
with cryptographic file systems like
&lt;a href=&quot;https://hovav.net/ucsd/dist/xxfs.pdf&quot;&gt;SiRiUS&lt;/a&gt;
and &lt;a href=&quot;https://tahoe-lafs.org/trac/tahoe-lafs&quot;&gt;Tahoe-LAFS&lt;/a&gt;,
but it&#39;s a complicated problem and I&#39;m not aware
of any really large scale deployments.&lt;/p&gt;
&lt;p&gt;The paywall problem is actually somewhat harder.
For instance, the New York Times could encrypt all its content
and then give every subscriber a key which could be used to
decrypt it, but given the number of subscribers, and that only
one has to leak the key,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
the chance
that that key will leak is essentially 100%.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Of course, people share NYT passwords too, but what makes
this problem harder is that the password then has to be
used on the NYT site and it&#39;s possible to detect misbehavior,
such as when 20 people use the same password. I&#39;m not
aware of any really good P2P-only solution here.&lt;/p&gt;
&lt;h2 id=&quot;non-static-content&quot;&gt;Non-Static Content &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#non-static-content&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Access control is actually a special case of a more general problem:
many if not most Web sites do more than simple publishing of static
content and those sites depend on server side processing that is hard to
replicate in a decentralized system.&lt;/p&gt;
&lt;h3 id=&quot;non-secret-computation&quot;&gt;Non-Secret Computation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#non-secret-computation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As a warm-up, let&#39;s take a comparatively easy problem, the shopping
site I described in &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/&quot;&gt;part II&lt;/a&gt; of my
Web security model series. Effectively, this site has three
server-side functions that need to be replicated:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Product search&lt;/li&gt;
&lt;li&gt;Shopping cart maintenance&lt;/li&gt;
&lt;li&gt;Purchasing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second and third of these are actually reasonably straightforward:
the shopping cart can be stored entirely on the client or, alternately,
stored self-encrypted by the client in the P2P system, as described in
the previous section. The purchasing piece can be handled by some
kind of cryptocurrency (though things are more complicated if you
want to take credit cards).
However, product search is more difficult.
The obvious solution would just be to publish the entire product
catalog in the network, have the client download it, and do search
locally. This obviously has some pretty undesirable performance consequences:
consider how much data is in Amazon&#39;s catalog and how often it changes.&lt;/p&gt;
&lt;p&gt;Obviously, the way this works in the Web 2.0 world is that the
server just runs the computation and returns the result, and at
this point you usually hear someone propose some kind of distributed computation
system a la &lt;a href=&quot;https://ethereum.org/en/smart-contracts/&quot;&gt;Ethereum smart contracts&lt;/a&gt;
(though you probably don&#39;t want the outcome recorded on the blockchain).
In this case, instead of publishing a static resource, the site
would publish a program to be executed that returned the results
(often these programs are written in &lt;a href=&quot;https://webassembly.org/&quot;&gt;WebAssembly&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Aside from the obvious problem that this still requires the node
executing the program to have all the data, it&#39;s hard for the
end-user client to determine that the node has executed the
program correctly. Even in a simple case like searching for matching
records: if those records are signed then the node can&#39;t substitute
their own values, but they can potentially conceal matching ones.
There are, of course, cryptographic techniques that potentially
make it possible to prove that the computation was correct, but they
are far from trivial. So, this doesn&#39;t have a really great solution.&lt;/p&gt;
&lt;h3 id=&quot;secret-information&quot;&gt;Secret Information &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#secret-information&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A shopping site is actually a relatively simple case because the
information is basically public—though in some cases the
site might not want their catalog to be public—but there
are a lot of cases where the site wants to compute with secret information.
There are two primary situations here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The site&#39;s secret information, for instance Twitter&#39;s recommendation
algorithm is not public.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The user&#39;s secret information, for instance which other users
they have &amp;quot;swiped right&amp;quot; on in a dating app, or even just
users&#39; profile details.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In Web 2.0, the way this works is that the server knows the secret
information and uses it for the computation but doesn&#39;t reveal
it to the users. As with the search case, though, that doesn&#39;t
port easily to the P2P case because it&#39;s not safe to reveal the
information to random people&#39;s personal computers.&lt;/p&gt;
&lt;p&gt;There are, of course, cryptographic mechanisms for computing specific
functions with encrypted data. For instance,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Private_set_intersection&amp;amp;oldid=1081416156&quot;&gt;Private Set Intersection&lt;/a&gt;
techniques make it possible to determine whether Alice and Bob
both swiped right on each other and only tell them if they
both did, but they&#39;re complicated and more importantly task specific,
so you need a solution for each application, and sometimes that
means inventing new cryptography (to be clear, this is far from
all that is required to implement a secure P2P dating system!).&lt;/p&gt;
&lt;p&gt;This is actually a general problem with cryptographic replacements
for computations performed on &amp;quot;trusted&amp;quot; servers. The positive
side of cryptographic approaches is that they can provide
strong security guarantees, but the negative side is that essentially
each new computation task requires some new cryptography,
which makes changes very slow and expensive. By contrast, if you&#39;re
doing computation on a server, then changing your computations
is just a matter of writing and loading it onto the server.
The obvious downside is that people have to trust the server,
but clearly a lot of people are willing to do that.&lt;/p&gt;
&lt;h3 id=&quot;hybrid-architectures&quot;&gt;Hybrid Architectures &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#hybrid-architectures&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One idea that is sometimes floated for addressing this kind of
functional issue is to have a hybrid architecture.
For instance, one might imagine implementing the shopping site by
having the static content of the catalog served via the P2P network
but having a server which handled the searches and returned pointers
to the relevant sections of the catalog. You could even encrypt each
individual catalog chunk so that it was hard for a competitor to see
your entire catalog. You could even imagine building a dating site
with—handwaving alert!—some combination of P2P and server technology, with the logic for
determining which profiles you could see and which to match you with
implemented on the server, but the (encrypted) profiles distributed
P2P.&lt;/p&gt;
&lt;p&gt;At this point, though, you have pretty substantial server component
that is in the critical path of your site and so you&#39;re mostly using the P2P
network as a kind of not-very-fast CDN (see, for instance,
&lt;a href=&quot;https://www.youtube.com/watch?v=PnBIIdmKO9o&quot;&gt;PeerCDN&lt;/a&gt;). This gives
up most of the benefits of having your system decentralized in the
first place: you still have the problem of hosting your server
somewhere, which probably means some cloud service, and at that point
why not just use a CDN for your static content anyway? Similarly,
if you&#39;re worried about censorship, then you need to worry about
your server being censored, which makes your site unusable even
if the P2P piece still works.&lt;/p&gt;
&lt;h2 id=&quot;closing-thoughts&quot;&gt;Closing Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#closing-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s easy to see the appeal of a more decentralized Web: who wants
to have a bunch of faceless mega-corporations deciding what you can
or cannot say? And there certainly are plenty of jurisdictions that
censor people&#39;s access to the Web and to information more generally.
It&#39;s easy to look at the success of P2P content
distribution systems—albeit to a great extent for distributing
content for which other people hold the copyrights—and come
to the conclusion that it&#39;s a solution to the Web centralization
problem.&lt;/p&gt;
&lt;p&gt;Unfortunately, for the reasons described above, I don&#39;t think that&#39;s
really the right conclusion. While the Web sort of superficially
resembles a content distribution system, it&#39;s actually something
quite different, with both a far broader variety of use cases
and much tighter security and performance requirements.
It&#39;s probably possible to rebuild some simpler systems on a P2P
substrate, but the Web as a whole is a different story, and even
systems that appear simple are often quite complex internally.
Of course,
the Web has had almost 30 years to grow into what it is,
and it&#39;s possible that there are technological improvements
that would let us build a decentralized system with similar properties,
but I don&#39;t think this is something we really understand
how to do today.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Though see &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=X_Window_System&amp;amp;oldid=1079491346&quot;&gt;X&lt;/a&gt;
in which these roles are sort of reversed. &lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Interestingly, within certain limits latency doesn&#39;t
have that much impact on how fast you can send the
data because the rate control algorithms can adjust
for latency. &lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Allan Schiffman used to call this a &amp;quot;distributed single
point of failure&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Or, as the
nerds say, &amp;quot;unity&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/challenges-web-decentralization/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding The Web Security Model, Part IV: Cross-Origin Resource Sharing (CORS)</title>
		<link href="https://educatedguesswork.org/posts/web-security-model-cors/"/>
		<updated>2022-04-19T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web-security-model-cors/</id>
		<content type="html">&lt;p&gt;This is part IV of my series on the Web security model (parts
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising&quot;&gt;outtake&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin&quot;&gt;III&lt;/a&gt;).
In this post, I cover &lt;em&gt;cross-origin resource sharing (CORS)&lt;/em&gt;,
a mechanism for reading data from a different site.&lt;/p&gt;
&lt;p&gt;As discussed in &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin&quot;&gt;part III&lt;/a&gt;, the Web
security model allows sites to import content from another site but
generally isolates that content from the importing site. For instance,
&lt;code&gt;example.com&lt;/code&gt; can pull in an image in from some &lt;code&gt;example.net&lt;/code&gt; and display it to
the user, but it can&#39;t access the contents of the image. This is
a necessary security requirement because it prevents attackers
from exploiting ambient authority to access sensitive data
but it also prevents legitimate uses for cross-origin data,
such as a cross-origin API.&lt;/p&gt;
&lt;h2 id=&quot;cross-origin-apis&quot;&gt;Cross-Origin APIs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#cross-origin-apis&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Consider the case where there is a Web service that has an API,
like &lt;a href=&quot;https://www.mediawiki.org/wiki/API:Query&quot;&gt;Wikipedia&lt;/a&gt;
or &lt;a href=&quot;https://wiki.mozilla.org/Bugzilla:REST_API&quot;&gt;Bugzilla&lt;/a&gt;,
and you want to write a Web application which takes advantage
of that API. For instance, suppose I have a little Web
service which lets you get the weather at a specific location
indicated by ZIP code. This service might have an API endpoint at
the following URL.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;https://weather.example/temperature?94303
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With the response being a JSON structure:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;temperature&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;25&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token property&quot;&gt;&quot;units&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;C&quot;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A Web site could access this API and display the local temperature
using the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API&quot;&gt;fetch() API&lt;/a&gt;
like so, with the zip code being 94303 (Palo Alto).&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token function&quot;&gt;fetch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;https://weather.example/temperature?94303&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;then&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;then&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Temperature is &quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;temperature &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot; degrees &quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;units&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;catch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Error &quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Obviously, a real application would do something more interesting, but
I&#39;m just giving an example here; as with many things Web, the
platform capability is simple but the
complexity is in the application logic.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;server-to-server-apis&quot;&gt;Server-to-Server APIs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#server-to-server-apis&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s mostly possible to replace all of these client-side APIs
with server-to-server APIs in which the API-using Web site
talks directly to the Web service. This is a pretty common
pattern on the Web: the user authorizes site A to
perform operations on its behalf on site B (typically
using &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=OAuth&amp;amp;oldid=1083104041&quot;&gt;OAuth&lt;/a&gt;)
and then send the data to the client.
This is, for instance, how Github
integrations work.&lt;/p&gt;
&lt;p&gt;However, there are plenty of situations where it&#39;s more efficient to
send the data directly to the client, especially if there is a lot of
data.  Note that from the perspective of site B it&#39;s not really
safer to have the data sent to to a Web page served off of site A than it
is to send it to site A directly, because the JS is of course
under control of site A and can always just send it back to
A.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This all works fine if the site that is consuming the temperature
API is the same as the one hosting it, but what if it&#39;s not? There
are a number of ways this can happen:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The sites are operated by the same entity, but they site
is built as a Web app that runs in the browser and consumes
data from the API. The app might be downloaded from one
server and the API be on another server.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The sites are operated by different entities, for instance
if the Web service is public, as in my temperature example.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, if the sites are different, then this
request violates the the same origin
policy, as described in &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising&quot;&gt;part III&lt;/a&gt;.
If I try to do this, the browser will generate an error (on
Firefox, &lt;code&gt;TypeError: NetworkError when attempting to fetch resource&lt;/code&gt;)
triggering the &lt;code&gt;catch&lt;/code&gt; clause
in the code above.&lt;/p&gt;
&lt;p&gt;This restriction exists for a good reason. Even though this
particular application seems safe, because the temperature API is public, others might not be. Because (1) the Web threat
model assumes that any site can be malicious and (2) requests from the
browser contain the ambient authority of the client. If you allow
an attacker to use the ambient authority of the client, you are asking
for problems. For example, Gmail is a &amp;quot;single page app&amp;quot; in which
the server loads a JS program onto the browser and then that browser
uses Web APIs to read your messages. If other Web sites can do that, then
this would obviously be bad!&lt;/p&gt;
&lt;p&gt;Instead of restricting what you can do with the cross-origin
requests, you might think that browsers could get away with
just removing cookies whenever you use cross-origin &lt;code&gt;fetch()&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; This is
only a partial solution though, because cookies are not the only
kind of ambient authority. A particularly important case is
where the victim browser is able to connect to network resources
that the attacker cannot directly, for instance if the
browser is on the same local network as the server and there
is a firewall preventing external access, but the server
doesn&#39;t use cookies for access control. In this case, if
an attacker could do cross-origin &lt;code&gt;fetch()&lt;/code&gt; then they
might be able to steal data from the server even if the
browser strips cookies.&lt;/p&gt;
&lt;p&gt;Even with the same-origin policy it is still possible to attack machines behind
the firewall under certain conditions. For instance, if
they are not using HTTPS, then it is possible to mount
something called a &lt;a href=&quot;https://crypto.stanford.edu/dns/dns-rebinding.pdf&quot;&gt;DNS rebinding&lt;/a&gt;
attack in which the attacker loads their page and then
changes their DNS to
point their site (e.g., &lt;code&gt;attacker.example&lt;/code&gt;) to point
to the server behind the firewall. This causes the
browser to think that the behind-the-firewall server
is actually the attacker&#39;s server and hence same-origin
to the attacker&#39;s site (another reason to use HTTPS).&lt;/p&gt;
&lt;p&gt;What we need here is a controlled way of allowing cross-origin requests
that ensures they can&#39;t be used for attack.&lt;/p&gt;
&lt;h2 id=&quot;jsonp&quot;&gt;JSONP &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#jsonp&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It turns out that even without CORS, the Web platform actually had a mechanism that lets
you make cross-origin requests; it&#39;s just super-hacky. You may recall from &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/posts/web-security-model-origin/#what-about-javascript%3F&quot;&gt;part
III&lt;/a&gt; that
JavaScript executes in the context of the loading page, even when it&#39;s
loaded from another origin. This means that you can simulate a Web
services API by having the main Web page load a script from the
Web services site. That script then inserts the data into the context of the loading
Web page.&lt;/p&gt;
&lt;p&gt;In order to make this work, you need to do two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Instead of using &lt;code&gt;fetch()&lt;/code&gt; the API-using page needs to
use &lt;code&gt;&amp;lt;script src=&amp;quot;&amp;quot;&amp;gt;&lt;/code&gt; to load the API point from the server.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Instead of returning JSON, the server needs to return actual
JavaScript which the inserts the data in the page.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For instance, the API-using page might do:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;script&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;temperatureReady&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Temperature is &quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;temperature &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot; degrees &quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;units&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;script&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;script src&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;https://weather.example/temperature?94303&amp;amp;callback=temperatureReady&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then the Web service API would return:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token function&quot;&gt;temperatureReady&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;temperature&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;25&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;units&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;C&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code just calls the &lt;code&gt;temperatureReady()&lt;/code&gt; function that already exists in
the page (the way the Web service knows which function to call is that it&#39;s
passed in query parameter in the URL) with the data as the argument to the function.
Because the script runs in the context
of the page, this is permitted and the result is that the data gets
imported into the page as well. Mission accomplished!&lt;/p&gt;
&lt;p&gt;Note that in the real world the API-using page wouldn&#39;t just statically
include the script. Rather, when you wanted to make an API call,
JS on the page would dynamically insert the script tag (remember
that JS can manipulate the DOM), inserting whatever URL was necessary
to make the correct API call.&lt;/p&gt;
&lt;p&gt;This idiom, &lt;a href=&quot;https://web.archive.org/web/20091204053053/http://bob.pythonmac.org/archives/2005/12/05/remote-json-jsonp/&quot;&gt;invented (or at least popularized) by Bob Ippolito&lt;/a&gt;,
is conventionally called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=JSONP&amp;amp;oldid=1062080645&quot;&gt;JSONP&lt;/a&gt;,
because it&#39;s commonly used to wrap APIs which use JSON-formatted data
and that JSON data is &amp;quot;padded&amp;quot; by wrapping it to make it valid JavaScript
(otherwise it will be rejected by the browser as JSON is not well-formed
JavaScript). However, there is no rule that the JavaScript returned by the
site has to have embedded JSON in it. For instance it could return
XML and invoke the XML parser, or just return a bare value such
as the temperature as an integer. The API contract just requires
that the JS served by the server calls the callback function that
the API-using page indicates; as long as it does that everything
will work.&lt;/p&gt;
&lt;h3 id=&quot;attacks-by-the-api-server&quot;&gt;Attacks by the API Server &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#attacks-by-the-api-server&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Moreover, nothing restricts the Web services server
from doing other things besides calling the indicated callback:
it can do anything it wants, including changing the DOM in any
way it pleases, stealing the user&#39;s cookies, or making
API calls to the Web site that the page was served off of.
In other words, a naive use of JSONP requires large amounts of
trust in the Web service you are using; this is obviously not ideal.&lt;/p&gt;
&lt;p&gt;It&#39;s possible to address these issues by adding a &lt;em&gt;third&lt;/em&gt; origin into
the mix, as shown in the diagram below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/JSONP-iframe.png&quot; alt=&quot;JSONP with an IFRAME&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The idea here is that instead of loading the JavaScript directly
from the API server into your page, you instead load it into
an IFRAME which is hosted on a second origin that you control
(e.g., &lt;code&gt;proxy.example.com&lt;/code&gt;). That IFRAME ends up with
the data but because it&#39;s cross-origin to your site it
can&#39;t impact your site, and thus
it is safer to load potentially malicious JS into it.
You then use the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage&quot;&gt;postMessage() API&lt;/a&gt;
to talk to the IFRAME to get the data in and out. Effectively,
this creates a little proxy which protects you against the Web services
API JS.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
I&#39;ve actually never seen this trick written down (readers: if you&#39;re
aware of a published description, please send me pointers) but I&#39;m pretty confident
it will work.&lt;/p&gt;
&lt;p&gt;Of course, this is all a bit clunky, but it work (to quote Spinal Tap, &amp;quot;it&#39;s such a fine line between stupid
and clever.&amp;quot;). If you wanted to do cross-origin
queries before CORS you didn&#39;t have a lot of options.&lt;/p&gt;
&lt;h3 id=&quot;attacks-by-the-api-client&quot;&gt;Attacks by the API Client &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#attacks-by-the-api-client&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Maybe the API-using site trusts the Web service site or uses
something like the proxy technique above to protect itself, but that
just gets us back to where we were without JSONP, with the need to
find some way to protect the Web service from the API client.&lt;/p&gt;
&lt;p&gt;There are actually two related problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Preventing the API client from reading data it shouldn&#39;t
from the service.&lt;/li&gt;
&lt;li&gt;Preventing the API client from causing unwanted side effects
on the service.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The way to think about both of these is that the attacker is abusing
the user&#39;s authority to talk to the Web service, and so is
able to cause the Web service to do things on behalf of the user.
It&#39;s important to understand that the server is trusting the browser
to follow the rules; if the browser behaves incorrectly then
all bets are off. The reason this is (mostly) OK is that the
threat model is that the attacker is attempting to abuse
the user&#39;s access to the service. Nothing stops the user from extracting the
cookies themselves and making any requests they want.
The server has to have its own access control checks
that prevent abuse by the user.&lt;/p&gt;
&lt;p&gt;The basic defense here is to ensure that the client site which is
making the request is authorized to do so. A common pattern is for
the service to require you to authorize that site, with
a dialog like the one below. Note: this dialog is actually for a different kind of
access where CircleCI talks directly to GitHub,
but the idea is the same and how would you know if I didn&#39;t tell you?&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/circle-ci-auth.png&quot; alt=&quot;Circle CI auth box&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If you approve access for site &lt;code&gt;circleci.com&lt;/code&gt;, then the Web service
(in this cases GitHub)
would add an access control entry to your account that indicated that
the other site (in this case &lt;code&gt;circleci.com&lt;/code&gt; could make requests on your behalf. Of course, then it
to actually enforce those rules, which is where
things get a little bit tricky. This is done using either
the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer&quot;&gt;Referer&lt;/a&gt;
header or the newer &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Origin&quot;&gt;Origin&lt;/a&gt;
header to determine which site is making the request. The service then
looks that up against the access control list to determine whether to
allow the request or not. Neither of these headers can normally
be controlled by the attacker (they are on the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Glossary/Forbidden_header_name&quot;&gt;forbidden header list&lt;/a&gt; of headers which
JS cannot modify)
and therefore can be trusted by the server (remember, that you&#39;re
worried about attack by a site, not by the user, who can of
course make their browser do whatever they want).&lt;/p&gt;
&lt;p&gt;The major drawback of using &lt;code&gt;Referer&lt;/code&gt; or &lt;code&gt;Origin&lt;/code&gt; in this
way is that they are &lt;a href=&quot;https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html#checking-the-referer-header&quot;&gt;sometimes missing and the checks can be
tricky to get right&lt;/a&gt;
in which case you will inadvertently deny service to
a legitimate client. As far as I can tell, however, they fail
&amp;quot;safe&amp;quot; in that if you implement them correctly
you won&#39;t accidentally give access to someone who should not have
access.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;From one perspective, JSONP solves our problem: it lets us make
cross-origin API requests. In principle, we probably could build
everything we want with JSONP, but in practice it&#39;s a seriously
clunky mechanism—especially the part where we
inject&lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tags into the DOM—that takes a huge amount of care to use correctly,
and has big risks if used incorrectly. A lot of that can be hidden
with libraries but we still know it&#39;s there.
With that said, many
big sites (e.g., Google, Twitter, LinkedIn, etc.) deployed JSONP
APIs which just shows how useful a capability it is. What we needed
was a mechanism that did much the same thing but was simpler and
safer. This brings us to CORS.&lt;/p&gt;
&lt;h2 id=&quot;cors&quot;&gt;CORS &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#cors&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic idea behind CORS is that it allows the site from which the
resource is being retrieved to make limited exceptions to the
same-origin policy.&lt;/p&gt;
&lt;h3 id=&quot;simple-requests&quot;&gt;Simple Requests &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#simple-requests&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The simplest version of CORS allows the API-using site to
read back the results of its cross-origin requests, which, you&#39;ll
recall, is normally forbidden. In order to allow this, the server
sends back an &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Origin&quot;&gt;&lt;code&gt;Access-Control-Allow-Origin&lt;/code&gt;&lt;/a&gt;
header listing the origin that is allowed to read back the data.
There are two main options here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;*&lt;/code&gt; indicating that any origin is permitted&lt;/li&gt;
&lt;li&gt;An actual origin, such as &lt;code&gt;https://example.com&lt;/code&gt; indicating that only that origin is permitted&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, here is an example of a successful CORS request, in which
&lt;code&gt;example.com&lt;/code&gt; serves a page that makes a &lt;code&gt;fetch()&lt;/code&gt; request to
&lt;code&gt;service.example&lt;/code&gt;.  In this case, the service wants to allow the
request so it sends an appropriate &lt;code&gt;Access-Control-Allow-Origin&lt;/code&gt;
header, with the result that the browser delivers the data to the
JS.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/cross-origin-with-cors.drawio.png&quot; alt=&quot;CORS Simple example&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Sites use the &lt;code&gt;*&lt;/code&gt; value when they don&#39;t care who can read
their data—effectively for public data—and an actual origin if they want to restrict it
to certain origins (or to authenticated users, as described below).
You&#39;re only allowed to specific a single origin, so as a practical
matter the server needs to look at the client&#39;s &lt;code&gt;Origin&lt;/code&gt; header
and provide something matching in response. This is already useful
as it allows for effectively public data, and it mostly doesn&#39;t
enhance the attacker&#39;s capabilities as in most cases the attacker
can just connect directly to the server and retrieve the data
(with the exception of topological controls as described &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#it&#39;s-not-just-cookies&quot;&gt;above&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Where things get interesting is if the client provides a cookie,
because that cookie is (likely) tied to the user&#39;s authentication
and therefore is not something that an attacking Web site could
get unless they had compromised the user&#39;s credentials. Allowing
cross-origin reads in these circumstances is more dangerous and
CORS requires the service to add another header,
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Credentials&quot;&gt;&lt;code&gt;Access-Control-Allow-Credentials&lt;/code&gt;&lt;/a&gt;,
in order for the data to be readable. By default, cross-origin
requests don&#39;t include a cookie, which means that if the
server sets a cookie for some other reason (this is quite common)
and no authentication
is required, things will still work even if the server doesn&#39;t
set this header.&lt;/p&gt;
&lt;h3 id=&quot;non-simple-requests&quot;&gt;Non-Simple Requests &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#non-simple-requests&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This all works fine for situations where the security property
you need to enforce is one where the client can&#39;t read data
from the server, but what about cases where you what you&#39;re
concerned about is not about the site reading back the data but that
the request itself is dangerous even if the client can&#39;t
read back the response (for instance, the request might delete some
of the user&#39;s data).&lt;/p&gt;
&lt;p&gt;For this category of requests, CORS requires what&#39;s call
a &amp;quot;preflight&amp;quot;, which is basically an HTTP request in which
the browser asks &amp;quot;Is it OK if I were to make this request?&amp;quot;,
and then only makes the request if the server says &amp;quot;yes&amp;quot;,
as shown in the diagram below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/cross-origin-with-preflight.drawio.png&quot; alt=&quot;CORS with pre-flight&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Note that the preflight uses the &lt;code&gt;OPTIONS&lt;/code&gt; method. Because
&lt;code&gt;OPTIONS&lt;/code&gt; is not used for ordinary HTTP requests, this
prevents side effects from the preflight itself.&lt;/p&gt;
&lt;p&gt;So, what requests need preflighting? Those which meet any of
the following conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using an HTTP method other than &lt;code&gt;GET&lt;/code&gt;, &lt;code&gt;HEAD&lt;/code&gt;, or &lt;code&gt;POST&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Using non-automatic values for any headers other than
&lt;code&gt;Accept&lt;/code&gt;, &lt;code&gt;Accept-Language&lt;/code&gt;, &lt;code&gt;Content-Language&lt;/code&gt;, &lt;code&gt;Content-Type&lt;/code&gt;, &lt;code&gt;Range&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Having any media type other than &lt;code&gt;application/x-www-form-URL-encoded&lt;/code&gt;, &lt;code&gt;multipart/form-data&lt;/code&gt; or &lt;code&gt;text/plain&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Not having any event listeners for the upload&lt;/li&gt;
&lt;li&gt;Not using a &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream&quot;&gt;&lt;code&gt;ReadableStream&lt;/code&gt;&lt;/a&gt; on the request&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a sort of odd list, isn&#39;t it? Take the method for example.
You can do plenty of damage using the &lt;code&gt;POST&lt;/code&gt; method? And why can
you do &lt;code&gt;POST&lt;/code&gt; and not &lt;code&gt;PUT&lt;/code&gt;, for instance? For many of these
properties, the answer is that these are the capabilities that JavaScript
already had pre-CORS. For example, if you have an HTML form, you can generate
a HTTP request with any of these methods and the allowed media
types. I haven&#39;t checked the other restrictions in detail, but I believe they
map onto similar &amp;quot;you can already do it&amp;quot; contours: for instance, HTTP
forms let the site upload stuff, but if you can track the process of the
upload, then you can see if the server processed some part of it and
then took some action (for instance, rejected it). This
would let you learn some information about the behavior of the
server in response to this request, which you otherwise would not be permitted to do.&lt;/p&gt;
&lt;p&gt;In other words, simple requests are (approximately) those you could do without
CORS, which means that they are safe to do with CORS, as long as the server
agrees to the JS having access to the data. However, if you couldn&#39;t have
done it without CORS the client needs to do a preflight.&lt;/p&gt;
&lt;h3 id=&quot;failing-safe&quot;&gt;Failing Safe &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#failing-safe&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One thing that&#39;s key to note here is that the server has to opt-in
to any of the new CORS behavior. For simple requests, if the server
doesn&#39;t respond with the appropriate header, then the response
won&#39;t be available to the JS, as shown in the example below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/cross-origin-without-cors.drawio.png&quot; alt=&quot;Non-CORS example&quot; /&gt;&lt;/p&gt;
&lt;p&gt;For non-simple requests, if the server
doesn&#39;t accept the preflight, then the request never happens at all.
Because not sending these headers is just the existing pre-CORS behavior,
this means that CORS fails safe: if you have a server which you
didn&#39;t update then the browser just falls back to the pre-CORS behavior.
This is a really critical property when rolling out a new Web feature:
we don&#39;t want that feature to be a threat to existing sites.&lt;/p&gt;
&lt;h2 id=&quot;the-web&#39;s-design-values&quot;&gt;The Web&#39;s Design Values &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#the-web&#39;s-design-values&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Pulling back, the story of CORS is a good example of how the Web
platform evolves.&lt;/p&gt;
&lt;h3 id=&quot;don&#39;t-break-anything&quot;&gt;Don&#39;t Break Anything &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#don&#39;t-break-anything&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As detailed in &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin&quot;&gt;part III&lt;/a&gt;, the basic
structure of the same-origin policy and the capabilities it gives
sites was well in place before we really understood the security implications. This means that sites
had come to depend on those properties and that made them really hard to
change. Because those properties were hard to change, sites had to
build defenses under the assumption that browsers weren&#39;t going
to change their behavior, hence compatible hacks like anti-CSRF tokens
rather than more principled solutions like &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie/SameSite#lax&quot;&gt;SameSite Cookies&lt;/a&gt;
that depended on the browser changing.&lt;/p&gt;
&lt;p&gt;Conversely, when we are rolling out a new feature, it&#39;s critically
important that it not create a new security threat for the Web.
In particular, sites depend on the existing browser behavior, so
you can&#39;t change that in a way that would make existing behavior
unsafe.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
However, this means that it&#39;s generally safe to deploy new functionality as
long as it stays within the existing assumptions that sites have
made about browser behavior, which is how you get to the design of
CORS.&lt;/p&gt;
&lt;h3 id=&quot;paving-the-cowpaths&quot;&gt;Paving the Cowpaths &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#paving-the-cowpaths&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If there&#39;s any consistent pattern in the Web, it&#39;s that if there is
something people want to do and there is a way to do it—no matter how hacky—people will
find that way and use it; hence JSONP (see also, &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#notifications&quot;&gt;long poll&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/jeff-goldblum.jpg&quot; alt=&quot;Jeff Goldblum&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Much of the job of evolving the Web platform consists of looking
at people do with the Web in a hacky way and designing better
mechanisms that (1) does what people want and (2) is convenient,
or at least &lt;em&gt;more&lt;/em&gt; convenient than whatever they are doing now
(3) doesn&#39;t create new risks. If this is done right, the new
mechanism will gradually replace the old hacky one and the
Web gets a little better.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-side-channels&quot;&gt;Next Up: Side Channels &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#next-up%3A-side-channels&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Everything I&#39;ve written so far assumed that browsers actually
do enforce the guarantees that they are supposed to enforce.
Unfortunately, this turns out to be a lot harder to do than
you might think. In particular, there are a number of
of situations where attackers can use side channels
(e.g., timing) to learn information that it can&#39;t learn
directly. I&#39;ll be covering that in the next post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Removing them from any cross-origin load would break cases
where sites load cross-origin images and the like. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I believe it&#39;s also possible for the Web Service to know
that it will be loaded inside an IFRAME and thus dispense
with the extra site, but I&#39;m not 100% sure. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
&lt;code&gt;Referer&lt;/code&gt; checking is also common defense in depth measure against &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Cross-site_request_forgery&amp;amp;oldid=1078022726&quot;&gt;Cross-Site Request Forgery (CSRF)&lt;/a&gt; attacks, but it&#39;s not entirely sufficient
because of the way HTTP handles redirects. Specifically, if a victim site redirects
a page to an attacker site and the attacker-re-redirects back to the victim
site to mount a CSRF, the &lt;code&gt;Referer&lt;/code&gt; header will be the victim site,
which creates an attack vector. This is not really an issue for JSONP
because if you load JS off an attacker site, you already have much bigger
problems than CSRF. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API&quot;&gt;WebSockets&lt;/a&gt;
was delayed for some time after
Huang, Chen, Barth, Jackson, and I found low a incidence
&lt;a href=&quot;https://ptolemy.berkeley.edu/projects/truststc/pubs/840.html&quot;&gt;risk&lt;/a&gt;
from deploying it as-is and the WG had to add a defense called
&amp;quot;masking&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-cors/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Lake Sonoma 50 Race Report</title>
		<link href="https://educatedguesswork.org/posts/lake-sonoma-50/"/>
		<updated>2022-04-12T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/lake-sonoma-50/</id>
		<content type="html">&lt;p&gt;Last weekend I raced the &lt;a href=&quot;https://lakesonoma50.com/&quot;&gt;Lake Sonoma 50 mile&lt;/a&gt;
up in Northern California.
In ultra circles, Sonoma is well known for being very runnable,
which—in the ultra context—means that there aren&#39;t a lot of long or steep hills and it
mostly consists of dirt fire roads and smooth non-technical single-track
(i.e., one person wide) trails, so you can plausibly run
almost the whole thing if you are strong. This is by contrast
to some other races I&#39;ve done like &lt;a href=&quot;https://educatedguesswork.org/posts/bigfoot73&quot;&gt;Bigfoot 73&lt;/a&gt;,
which were steeper and had more difficult footing, so as a practical
matter you were going to be doing a lot of hiking.&lt;/p&gt;
&lt;p&gt;There&#39;s almost nothing in Sonoma that I couldn&#39;t have run on its
own or in a 25 mile event, but it has around 10,500 ft (3000m) of elevation
gain (and also 10,500 ft of loss because it&#39;s an out and back course), which
means that it&#39;s full of rolling hills and small creek crossings and
you&#39;re almost never running on the flats. To do well you have to have
good fitness and the discipline to keep the right pace and so it seemed like a
good opportunity to test out my early season fitness, so I put my name
into the lottery and got waitlisted, but then apparently a lot of
people decided not to do it, as they cleared the waitlist and then
re-opened entries to everyone. This gave my training partner
&lt;a href=&quot;https://chris-wood.github.io/&quot;&gt;Chris&lt;/a&gt; a chance to sign up and we ran
most of the race together.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sonoma-50-map.png&quot; alt=&quot;Lake Sonoma 50 Map&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/sonoma-50-elevation.png&quot; alt=&quot;Lake Sonoma 50 elevation profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Screenshots from &lt;a href=&quot;https://runalyze.com/&quot;&gt;Runalyze&lt;/a&gt;]&lt;/p&gt;
&lt;p&gt;My plan here was to run the first 25-30 miles at &amp;quot;long run&amp;quot; pace,
which is basically what people would call an &amp;quot;easy&amp;quot; effort level
(for me this ranges from
about 8:00/mile on the flats to 12:00/mile on the a very hilly
course) and then try to maintain it for the second half, which is
of course progressively harder as the fatigue builds up.
I had been doing my long runs on comparable courses at about
11:00/mile, so I was hoping for low 9 hrs (50 miles at 11:00 is 9:10).
This didn&#39;t entirely work out and I definitely slowed down throughout
the race, coming in at 9:44:09, which was good enough for
47th (out of 252 finishers, 310 starters). This is about the
40th percentile of my expectations.
In retrospect having seen the course low 9 hours seems too aggressive, but
I do think I could have done &amp;lt;9:30 if I had paced things better.  On the
other hand this is quite a bit faster than my previous 50 PR,
which was on the easier &lt;a href=&quot;https://www.scenaperformance.com/events/dick-collins-firetrails/&quot;&gt;Firetrails
50&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;pre-race&quot;&gt;Pre-Race &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#pre-race&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Sonoma logistics are pretty easy.  It&#39;s only a few hours away and
Chris and I drove up the afternoon before and stayed in Healdsburg about
20 miles from the race start. We
managed to pick up our race packets (including your race number) that
afternoon so it was possible to prep everything the night before and
then just show up at the race start. Regrettably we got there just as
main parking closed so had to drive about a quarter mile to overflow
parking (up a hill, which was really not amazing to walk up afterwards). Got to
the start in plenty of time to use the bathroom (twice!) and take a
pre-race photo (not online yet) with Chris, my friend
&lt;a href=&quot;https://brbrunning.com/&quot;&gt;Lisa&lt;/a&gt;, and some of her friends, who
were doing their first 50.&lt;/p&gt;
&lt;p&gt;It was about 45-50 at the start so I got a bit cold standing around
for 25 min, but of course the day warmed up soon enough and I&#39;d
rather be cold at the start than really hot later in the day.&lt;/p&gt;
&lt;h2 id=&quot;start-to-island-view-%5B4.26-mi%2C-%2B725%2F-988-ft%5D&quot;&gt;Start to Island View [4.26 mi, +725/-988 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#start-to-island-view-%5B4.26-mi%2C-%2B725%2F-988-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first 2.4 mi or so are on the road, so even easy distance pace is
fairly fast. This was good because we started out a bit too far back
in the pack and ended up gradually working our way up through the pack
by the time we hit the singletrack and the sharp downhill. It wasn&#39;t
too congested at this point and we mostly just settled into a pace
with the other people in our general pace range. Generally, I&#39;m a little
faster than average on the flats and uphill and slower on downhill, so there
was some yoyoing, but we tried not to do too much passing unless
it was a real problem, because we&#39;d just get passed right back.&lt;/p&gt;
&lt;p&gt;We rolled through Island View at a really hot pace (&amp;lt;10:00/mile)
and were still feeling good. It&#39;s water only on the way out so we didn&#39;t even
bother to stop.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;drinks-and-gels&quot;&gt;Drinks and Gels &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#drinks-and-gels&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;If you&#39;re gonna run for 10 hours you&#39;re going to need to eat
some stuff. Each race serves different stuff at their aid
stations but generally there will be at minimum some kind
of sports drink (basically carbohydrates + electrolyes) and some kind
of &amp;quot;gel&amp;quot;, which is basically a carbohydrate paste. There
are a lot of different companies that make this stuff and
each one has a different mix of macronutrients and different
flavors, so it&#39;s very possible you&#39;ll like one just fine
and find another disgusting. My drink preference is
&lt;a href=&quot;https://tailwindnutrition.com/&quot;&gt;Tailwind&lt;/a&gt;, which is pretty
common but not ubiquitous; I&#39;m less picky about gels.
Before a race I usually
figure out what they are serving and try it out beforehand
to see if I can stomach it (literally). In this case,
Sonoma was serving &lt;a href=&quot;https://guenergy.com/products/roctane-energy-drink-mix&quot;&gt;Gu&lt;/a&gt;
&lt;a href=&quot;https://guenergy.com/products/roctane-energy-gel&quot;&gt;Roctane&lt;/a&gt;
which I&#39;ve had before and like OK.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;island-view-to-warm-springs-%5B6.97-mi%2C-%2B1%2C421%2F-1%2C447-ft%5D&quot;&gt;Island View to Warm Springs [6.97 mi, +1,421/-1,447 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#island-view-to-warm-springs-%5B6.97-mi%2C-%2B1%2C421%2F-1%2C447-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next section is pretty much all single-track rollers and, we were
still feeling strong. We ended up in a paceline behind a group of
women who were all working together and given that the pace seemed
about right, we just sat behind them through the next aid station.
As before, the basic pattern is we&#39;d pull back a bit on the downhills
but then catch up on the uphills and flats. During this section
we were running the uphills until we were caught up; they were
hiking some of the uphills so we would hike behind them to the top
of the climb, then repeat.&lt;/p&gt;
&lt;p&gt;We were still going really fast into Warm Springs, though even at this
point it was starting to feel warmer. Had a little bit of a glitch at
the aid station because they were (at least I thought) only serving
the Strawberry Lemonade Roctane, which is caffeinated and I didn&#39;t
want to start on caffeine this early. I was down to 200 or so ml
Tailwind at this point so I just filled up with water water and then
had a gel + water, which should be roughly equivalent to 250ml
Tailwind.&lt;/p&gt;
&lt;h2 id=&quot;warm-springs-to-wulfow-%5B5.05-mi%2C-%2B1%2C138%2F-909-ft%5D&quot;&gt;Warm Springs to Wulfow [5.05 mi, +1,138/-909 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#warm-springs-to-wulfow-%5B5.05-mi%2C-%2B1%2C138%2F-909-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We were a bit slower coming out of the aid station but quickly
caught back up with the pack we had been running with. This section
was on average more up than down and you can see our pace starting
tall off a bit to 11:30/mi (10:05/mi GAP) but it still looks
pretty good. This section was still quite smooth and I was still
feeling strong. Because of the Tailwind issue, I was consuming more like 200cal/hr
than my target of 300 cal/hr here but otherwise things were pretty fine.
Wulfow is water only, so I just refilled on water and (I think)
grabbed a gel, as it was only 2 miles to Madrone.&lt;/p&gt;
&lt;h2 id=&quot;wulfow-to-madrone-%5B2.06-mi%2C-%2B302%2F-331-ft%5D&quot;&gt;Wulfow to Madrone [2.06 mi, +302/-331 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#wulfow-to-madrone-%5B2.06-mi%2C-%2B302%2F-331-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This time we got out of the aid station ahead of the pack, but there
is a sharp downhill right after, so our previous pack was on our heels pretty
quickly. There didn&#39;t seem to be too much interest in passing us, so I
just lead almost all the way to Madrone. Towards the very end it
opened up into uphill fire road and so things got a little
jumbled. This is actually the steepest climb, but it was early enough
in the day that it didn&#39;t feel too bad.&lt;/p&gt;
&lt;p&gt;Madrone had decaf Roctane so I was able to completely fill my
bottles. At this point it was starting to get a fair bit warmer, so I
was starting to drink some fluid at the aid station and then fill my
bottles.&lt;/p&gt;
&lt;h2 id=&quot;madrone-to-no-name-%5B5.86-mi%2C-%2B1312%2F-1066-ft%5D&quot;&gt;Madrone to No Name [5.86 mi, +1312/-1066 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#madrone-to-no-name-%5B5.86-mi%2C-%2B1312%2F-1066-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The pack sort of separated at this point and Chris and I found
ourselves pretty alone for the big descent out of Madrone.
This is when we started to see the first people coming the
other way, which meant they were about 7-8 miles ahead of us at this
point.&lt;/p&gt;
&lt;p&gt;We knew that there was a big climb and then the lollipop around the halfway mark, so we were
just kind of anticipating the climb, and it was a relief when we
finally got there. It&#39;s just a long trudge up that and we naturally
hiked. It&#39;s fire road so we just passed some people and got passed
by others. We were still seeing a substantial number of people
going the other way, but we also knew we were ahead of the
main body of people. It was definitely a relief to get into
the lollipop, though, because then you&#39;re no longer having people
pass you going the other way (except for a short out and
back to the aid station).&lt;/p&gt;
&lt;p&gt;We rolled into No Name at 4:24, which was pretty far ahead of schedule
and I was starting to have visions of a sub-9 finish (4:25 * 2 = 8:50,
right?). I stopped at the bathroom and drank a bunch of fluid as I was
definitely starting to feel hot and dehydrated. I also was able to
grab my drop bags which had extra Tailwind bottles, so I could be back
on Tailwind for the next few hours. Also grabbed my buff and had some
ice put in it. This aid station stop was pretty long, 5:14, but we
were still out right at 4:29, so ahead of plan.&lt;/p&gt;
&lt;h2 id=&quot;no-name-to-madrone-%5B5.22-mi%2C-%2B933%2F-1230-ft%5D&quot;&gt;No Name to Madrone [5.22 mi, +933/-1230 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#no-name-to-madrone-%5B5.22-mi%2C-%2B933%2F-1230-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Chris and I did this section pretty much on our own again, and
it was slower than it should have been. The rollers from the
lollipop to the big descent were starting to get to me and
the the descent was steep enough that we mostly just jogged
down it without taking it too fast, which did nothing for our
pace. Then it&#39;s some rollers and the climb back up to Madrone,
which we hiked.&lt;/p&gt;
&lt;p&gt;At Madrone I had the opposite problem as before which is that
I wanted caffeine but they didn&#39;t have either caffeinated Roctane
or Coke, so I ended up just grabbing a caffeinated Gu, which
has only 35 mg of caffeine.&lt;/p&gt;
&lt;h2 id=&quot;madrone-to-wulfow-%5B2.09-mi%2C-%2B348%2F-315-ft%5D&quot;&gt;Madrone to Wulfow [2.09 mi, +348/-315 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#madrone-to-wulfow-%5B2.09-mi%2C-%2B348%2F-315-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This section is where we really noticeably started to slow down.
As opposed to before, we were hiking any significant uphill,
rather than just when we were behind someone or it was really steep.
My theory here is I was starting to get tired and that I wouldn&#39;t
be moving much faster—if at all faster—if I was running,
so I was conserving energy a bit. At this point I was definitely
starting to feel pretty hot and dehydrated, and also maybe
a little stomach discomfort from drinking a lot of water at Madrone.
Wulfow was water and gels but unfortunately no salt, and I was running
out of my own salt tabs. Can&#39;t remember if I grabbed another
caffeinated Gu here.&lt;/p&gt;
&lt;h2 id=&quot;wulfow-to-warm-springs-%5B5.07-mi%2C-%2B919%2F-1125-ft%5D&quot;&gt;Wulfow to Warm Springs [5.07 mi, +919/-1125 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#wulfow-to-warm-springs-%5B5.07-mi%2C-%2B919%2F-1125-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This was probably the hardest section for me, both in how I felt and in
terms of of my pace, which was the worst of the race, both absolutely and &lt;em&gt;grade
adjusted pace (GAP)&lt;/em&gt;.
As above, I was running out of salt and just generally starting to
feel kind of beat. We were hiking anything that was even modestly
uphill and even so it was tough. Was just generally feeling kind
of wobbly and the log bridge that was a little iffy on the way
out felt downright scary. However, I also started to notice that
I was gapping Chris more and more on the uphills, though he&#39;d mostly
catch up on the downhills. This isn&#39;t too unexpected as I&#39;m a stronger
hiker, but it was the first time it was really happening.&lt;/p&gt;
&lt;p&gt;Fortunately, this section was a little shorter than we expected, so
we managed to get into Warm Springs OK. This was the longest aid station
stop at 5:23, mostly because we were messing around with drinks, etc.
This was the last set of drop bags and so I had another Tailwind
bottle. They also had Coke so I pulled out my third bottle and ended
up with one Coke, one Tailwind, and water (?). Had to
wait a bit for Chris to leave this aid station as he was still
getting ready to go.&lt;/p&gt;
&lt;h2 id=&quot;warm-springs-to-island-view-%5B7.09-mi%2C-%2B1417%2F-1470-ft%5D&quot;&gt;Warm Springs to Island View [7.09 mi, +1417/-1470 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#warm-springs-to-island-view-%5B7.09-mi%2C-%2B1417%2F-1470-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Started to feel better in this section, probably due to the
caffeine, and I shifted out of &amp;quot;hike when it won&#39;t be much slower&amp;quot;
mode into &amp;quot;run whenever you can&amp;quot; mode. About 3 miles in I noticed
that I was really starting to gap Chris and so he gave me the car
keys and I went ahead on my own, trying to push the pace as much
as I felt comfortable with, consistent with still having 9ish
miles to go. You can see this in the pace, which was faster than
the previous two segments and with the GAP being quite a bit
better. At this point I was starting to really pass a lot of
people, including finally catching the last of the women from
the pack we were running with.&lt;/p&gt;
&lt;p&gt;Still was pretty glad to see the turn off down to Island, as
that meant I was &amp;lt;5 to go. Hit the aid station and was frankly
a little disoriented and spent some time filling up on Coke
and trying to figure out which gels had caffeine even though
I had Coke in my bottles. Left the aid station right as Chris rolled
in.&lt;/p&gt;
&lt;h2 id=&quot;island-view-to-finish-%5B4.66-mi%2C-%2B1010%2F-676-ft%5D&quot;&gt;Island View to Finish [4.66 mi, +1010/-676 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#island-view-to-finish-%5B4.66-mi%2C-%2B1010%2F-676-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Hiked the hill out of Island View and then really tried to get
into the vibe of &amp;quot;fast finish&amp;quot;, given that I had less than 5
miles to go and I&#39;ve done plenty of fast finish
runs where you run the last few miles harder. Was still a bit unstable on my feet and tripped a bunch
of times. Stayed up but it made me cautious. Was really just
feeling like I needed to get to the big climb out and then into
the final rollers. Hiked that
part and then just tried to push through to the finish.
Spent the last two miles chasing the two guys in front of
me and felt like I closed on them a bit but never quite enough
to catch them.&lt;/p&gt;
&lt;p&gt;Right leg started to cramp a bit in the last mile or so but just
toughed it out and it want away. Was able to finish strong, and it&#39;s
nice to be under the round number of 9:45 (9:44:09).  Chris came in at
9:46:69, so he must not have lost much if anything on me the last
segment.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A bit of a mixed result. On the one hand, I think it&#39;s
clear I went in with too high expectations about what I could do here;
I don&#39;t think sub-9 or even 9:15 was in reach, at least on this
day. It wasn&#39;t crazy hot, but it did get to 80ish and I hadn&#39;t done
any heat training. Sunday was a lot cooler and I think that might have
shaved 10-15 min off my final time.&lt;/p&gt;
&lt;p&gt;My pacing was a bit off here. I think if I hadn&#39;t gone out as hard and
gotten to halfway in more like 4:40, I would have had a decent shot at
9:30 even on this day. I also wonder whether it would have been better
to push more in the third quarter. I lost a lot of time there and
clearly I was able to pick up the pace when I needed to in the fourth
quarter. I&#39;m not sure how much longer I could have sustained that, but
maybe it would have been better to go more evenly in the last half. I
did know that the rollers would be tiring but I don&#39;t think I
anticipated how tiring they would be in the second half and how
tempting it would be to hike.&lt;/p&gt;
&lt;p&gt;I more or less hit my nutrition plan. My target was to drink half a bottle of
Tailwind or Roctane every 3 miles (as a proxy for every half hour) and a
100 cals of gel or bar every 6 miles, for a total of ~300 cal/hr.
I mostly managed this except where I got thrown off by aid station
logistics and then towards the end when I was subbing in coke.  I
rotated my gels reasonably well so I never got too tired of anything
and was glad to have &lt;a href=&quot;https://myspringenergy.com/collections/spring-energy-products/products/canaberry&quot;&gt;Spring gels&lt;/a&gt; so it wasn&#39;t quite so much all space
food. I had some &lt;a href=&quot;https://www.maurten.com/products/gel-100-box-us&quot;&gt;Maurten gels&lt;/a&gt; in my drop bag but opted not to use them
because of not trying new stuff on race day.&lt;/p&gt;
&lt;p&gt;Wore my &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/s-lab-pulsar.html#color=25785&quot;&gt;Salomon Pulsars&lt;/a&gt; the whole way and I have mixed feelings
here. On the one hand they&#39;re super light, but the platform is really
narrow and they&#39;re more for toe strikers and the traction isn&#39;t
great so I slipped a bunch of times that I don&#39;t think I would have
in (say) the Sense Pro/4s, which are my usual race shoe. Also,
you&#39;re really not going that fast so having a super lightweight
seems less important than it would be on a shorter race; you&#39;re
not going to be going all-out. This will
probably be my last race with them as the new Salomon shoes are out
soon and the Pulsars are definitely too light for UTMB.&lt;/p&gt;
&lt;p&gt;My going in expectations aside, this is arguably a pretty good
result. Top 25% of finishers and almost top 15% of starters is better
than I&#39;ve finished in a long time. I was top 3rd at SOB and just
barely top half at Bigfoot, so this seems like an indicator that this
is actually a comparatively better performance than usual, even if the
time isn&#39;t quite what I was hoping for.&lt;/p&gt;
&lt;h2 id=&quot;results-summary&quot;&gt;Results Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/lake-sonoma-50/#results-summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finish Time: 9:44:09
&lt;br /&gt;
Actual distance: 48.4 miles
&lt;br /&gt;
Finish Place: 47th overall, 37th male, 310 starters&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Segment&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Distance&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Elevation&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Time&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Pace&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;GAP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Island View&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4.26 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+725/-988 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;39:29&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;9:16/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;8:25/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Warm Springs&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;6.97 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,421/-1,447 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:12:27&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:23/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;9:17/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:30&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Wulfow&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.05 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,138/-909 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;58:03&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:30/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:05/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;0:26&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Madrone&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2.06 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+302/-331 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;21:48&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:36/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;9:52/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:49&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;No Name&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.86 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,312/-1,066 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:08:18&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:39/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;9:54/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5:14&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Madrone&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.22 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+988/-1,230 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:03:12&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:06/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:36/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2:09&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Wulfow&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2.08 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+348/-315 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;26:42&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:51/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:45/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:02&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Warm Springs&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.07 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+919/-1,125&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:07:13&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:16/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:56/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5:23&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Island View&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7.09 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,417/-1,470 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:29:26&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:37/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:08/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2:44&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Finish&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4.66 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,010/-676 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;57:12&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:16/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:32/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
</content>
	</entry>
	
	<entry>
		<title>End-to-End Encryption and Messaging Interoperability</title>
		<link href="https://educatedguesswork.org/posts/messaging-e2e/"/>
		<updated>2022-04-07T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/messaging-e2e/</id>
		<content type="html">&lt;p&gt;The &lt;a href=&quot;https://www.europarl.europa.eu/news/en/press-room/20220315IPR25504/deal-on-digital-markets-act-ensuring-fair-competition-and-more-choice-for-users&quot;&gt;news&lt;/a&gt; the the EU
will &lt;a href=&quot;https://www.ianbrown.tech/wp-content/uploads/2022/03/Final-DMA-interoperability-text.pdf&quot;&gt;require that messaging companies provide
interoperability&lt;/a&gt;
has gotten a lot of attention, both positive
(&lt;a href=&quot;https://matrix.org/blog/2022/03/25/interoperability-without-sacrificing-privacy-matrix-and-the-dma&quot;&gt;matrix.org&lt;/a&gt;)
and negative (&lt;a href=&quot;https://twitter.com/alexstamos/status/1507145126006587411&quot;&gt;Alex
Stamos&lt;/a&gt;,
&lt;a href=&quot;https://alecmuffett.com/article/16037&quot;&gt;Alec Muffett&lt;/a&gt;, &lt;a href=&quot;https://twitter.com/SteveBellovin/status/1507375010054348805&quot;&gt;Steve
Bellovin&lt;/a&gt;),
as detailed in this
&lt;a href=&quot;https://www.wired.com/story/dma-interoperability-messaging-imessage-whatsapp/&quot;&gt;Wired&lt;/a&gt;
article (see also this &lt;a href=&quot;https://www.internetsociety.org/wp-content/uploads/2022/03/ISOC-EU-DMA-interoperability-encrypted-messaging-20220311.pdf&quot;&gt;ISOC&lt;/a&gt;
white paper). At a high level,
I&#39;m more positive on the idea of interoperability for messaging systems
than some others are, but it&#39;s certainly not a trivial problem and
at least some of the EU timelines seem pretty unreasonable. Read on
for more.&lt;/p&gt;
&lt;h2 id=&quot;critiques&quot;&gt;Critiques &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#critiques&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At a high level, there seem to be three broad critiques of messaging system
interoperability:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It will weaken security, for instance by requiring decryption
and re-encryption at system boundaries or by creating
confusion about user identities.&lt;/li&gt;
&lt;li&gt;It will hold back innovation by forcing messages to be
sent using only features that are common to all systems.&lt;/li&gt;
&lt;li&gt;It will make abuse (especially spam) worse.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It&#39;s useful to keep these in mind throughout the rest of the discussion.&lt;/p&gt;
&lt;p&gt;Before covering messaging, however, it&#39;s helpful look at an existing
system that has had interoperability for a long, where we can see the
resulting dynamics: e-mail.&lt;/p&gt;
&lt;h2 id=&quot;an-interoperable-system%3A-e-mail&quot;&gt;An Interoperable System: E-mail &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#an-interoperable-system%3A-e-mail&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;E-mail has the
opposite problem from messaging: where messaging consists of a number
of independent islands of encrypted messaging with no way to talk
between them, email is a globally interoperable system that—despite
a number of attempts—doesn&#39;t have anything like universal encryption.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;E-mail operates on a hub-and-spoke model in which every user is
associated with a given mail domain, represented by a domain
name (e.g., &lt;code&gt;example.com&lt;/code&gt;) as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/email.drawio.png&quot; alt=&quot;Email architecture&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;telephone-addressing&quot;&gt;Telephone Addressing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#telephone-addressing&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Telephone numbers actually are &lt;em&gt;hierarchically structured&lt;/em&gt; but don&#39;t map 1-1 with providers.&lt;/p&gt;
&lt;p&gt;The basic structure of a phone number is given by the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=E.164&amp;amp;oldid=1073189249&quot;&gt;E.164 standard&lt;/a&gt; and consists of a country code followed by a subscriber number,
with the structure of the subscriber number being defined by the country
code. For instance, in the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=North_American_Numbering_Plan&amp;amp;oldid=1075584876&quot;&gt;North American Numbering Plan&lt;/a&gt;, identified by country code 1, numbers
look like: &lt;code&gt;415.555.1111&lt;/code&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Digits&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Numbering plan area (aka area code)&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;415&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Central office prefix&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;555&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Line number, denoting subscriber&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;code&gt;1111&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;I don&#39;t know too much about the non-North American setting, so the remainder of
this aside is about North America.
Until 1984, North American telephony was basically monopolized by
the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Bell_System&amp;amp;oldid=1080354068&quot;&gt;Bell System&lt;/a&gt;. In
that system, the number hierarchy was geographic, with the area codes
and central office prefixes corresponding to geographic regions and
specific switches and the line number corresponding to lines on a given
switch. However, with the advent of local number competition following
the breakup of the Bell System and then mobile telephony, things started
to get more complicated.&lt;/p&gt;
&lt;p&gt;Initially, central offices were controlled by a single carrier and
so the phone number could be used straightforwardly for routing.
However, subsequently the US required carriers
to provide &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Local_number_portability&amp;amp;oldid=1077125532&quot;&gt;Local Number Portability&lt;/a&gt;, which allowed you to take your number from carrier to carrier.
Thus, even if you were originally assigned a number out of Verizon&#39;s
block, you could &amp;quot;port&amp;quot; it to T-Mobile, which means that this kind of hierarchical
routing no longer works. Instead, there&#39;s basically a giant—well,
not so giant, given that there are only 10 billion possible numbers—database
that indicates which carrier has responsibility for each number.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;E-mail addresses are hierarchically assigned, which means that if your
mail service is &lt;code&gt;example.com&lt;/code&gt;, then your address will end in
&lt;code&gt;@example.com&lt;/code&gt;, as in &lt;code&gt;alice@example.com&lt;/code&gt;.
It&#39;s helpful to work through an example here. For instance, here is
what happens when Alice (&lt;code&gt;alice@hotmail.com&lt;/code&gt;) wants to send a message to Bob (&lt;code&gt;bob@gmail.com&lt;/code&gt;):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;First, she transmits the message to her mail server
over a protocol called the &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Simple_Mail_Transfer_Protocol&amp;amp;oldid=1079015503&quot;&gt;Simple Mail Transfer Protocol (SMTP)&lt;/a&gt;&lt;/em&gt;,
along with the addressing information for &lt;code&gt;bob@gmail.com&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The sending mail server looks up the receiving
domain name—in this case &lt;code&gt;gmail.com&lt;/code&gt;—in the DNS
to get the server associated with it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; It then connects to that server—again over
SMTP—and transfers the message, along with the
addressing information &lt;code&gt;bob@gmail.com&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Assuming that &lt;code&gt;bob@gmail.com&lt;/code&gt; is actually a valid user on
the receiving server, that server stores the message somewhere
(on disk, in a database, whatever) and waits for Bob to
come pick it up.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, Bob connects to his mail server (historically over
a protocol called &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internet_Message_Access_Protocol&amp;amp;oldid=1071482084&quot;&gt;Internet Message Access Protocol (IMAP)&lt;/a&gt;&lt;/em&gt;)
and retrieves any new messages.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This structure has a number of important properties:&lt;/p&gt;
&lt;h4 id=&quot;addresses&quot;&gt;Addresses &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#addresses&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Because addresses are &lt;em&gt;scoped&lt;/em&gt; by the mail domain they are
associated with, it&#39;s possible to immediately know where
a given message should be delivered just by looking at the
&lt;em&gt;right-hand side (RHS)&lt;/em&gt; of the address, namely the stuff
after the &lt;code&gt;@&lt;/code&gt;-sign. That tells you which domain an
address is associated with. This is in contrast to addresses
on most popular services (e.g., Twitter), which are &lt;em&gt;unqualified&lt;/em&gt;:
if all I have is the identifier &lt;code&gt;ekr____&lt;/code&gt; I don&#39;t know if
that corresponds to Twitter, Github, or LinkedIn..&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Conversely, the fact that names are hierarchical means that
two people can have the same &lt;em&gt;left-hand side (LHS)&lt;/em&gt; as long as the RHS is
different (and vice versa). So, &lt;code&gt;bob@gmail.com&lt;/code&gt; and &lt;code&gt;bob@hotmail.com&lt;/code&gt;
are totally distinct addresses and quite likely belong to different
people. This is of course true with Twitter handles and the
like, but because they are &lt;em&gt;unqualified&lt;/em&gt;, the bare address
isn&#39;t enough to tell you who is who. This becomes a real issue
when you want to import identities from another namespace,
for example, when your address for messaging is actually your
telephone number.&lt;/p&gt;
&lt;p&gt;Finally, it means that the semantics of the LHS
are opaque to the other end. For instance, if you had your
own mail domain (for instance &lt;code&gt;your-lastname.name&lt;/code&gt;) you
might have every address that ends in &lt;code&gt;@your-lastname.name&lt;/code&gt;
delivered into the same mailbox. Another example is that
Gmail allows you to create new addresses by adding a plus sign
to the end of your actual address, so &lt;code&gt;example@gmail.com&lt;/code&gt;
and &lt;code&gt;example+newsletter@example.com&lt;/code&gt; go to the same place.
This is a useful trick to let you sort your email by giving
different addresses to each sender.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;hosted-domains&quot;&gt;Hosted Domains &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#hosted-domains&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Although mail is scoped by domain, as a practical matter
many domains are actually hosted by the same service.
For instance, Gmail allows you to host your &amp;quot;custom domain&amp;quot;
on Gmail (that is how &lt;code&gt;rtfm.com&lt;/code&gt; works), but your
address can still have your domain in it rather than
&lt;code&gt;gmail.com&lt;/code&gt;. It&#39;s also possible to have your mail
delivered to service A and have most of your accounts
there but send mail from service B. This is useful if you
want to send bulk email using a service like &lt;a href=&quot;https://www.mailgun.com/&quot;&gt;Mailgun&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;h4 id=&quot;interoperability&quot;&gt;Interoperability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#interoperability&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Because SMTP and IMAP are standardized, any mail endpoint
can talk to any other mail endpoint. If you own &lt;code&gt;example.com&lt;/code&gt;
and want to send and receive mail there, all you have to do
is stand up a server—or more likely, use an existing
hosting server—set up the right DNS records, and
you&#39;re good to go. Similarly, most mail services will provide IMAP
service and so you can use any number of clients
(the built in mail client on your Mac, &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mozilla_Thunderbird&amp;amp;oldid=1074344985&quot;&gt;Thunderbird&lt;/a&gt;, etc.) to
read your mail.&lt;/p&gt;
&lt;p&gt;Conversely, nothing says that a mail system has
to have a separate client at all. For instance, instead
of having people use IMAP to read their email you can just
put up a Web front end that accesses it directly and, tada,
you have Gmail. Or, as is common, you can both have a Web interface
&lt;em&gt;and&lt;/em&gt; an IMAP interface. As long as you properly speak SMTP, everything
will work fine and the other end doesn&#39;t even need to know how
you have everything set up; it&#39;s just a matter of having the
right protocol interfaces. In particular, it doesn&#39;t matter to
the receiver how the sender talks to their mail server
and it doesn&#39;t matter to the sender how the receiver
talks to their mail server. All that&#39;s required is that
the servers speak SMTP to each other.&lt;/p&gt;
&lt;p&gt;This is in contrast to most messaging systems, which are basically
silos that don&#39;t interoperate with each other.&lt;/p&gt;
&lt;h3 id=&quot;extensibility&quot;&gt;Extensibility &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#extensibility&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The cost of interoperable protocols is a limited range of
format extensibility. The format of the emails is standardized using a
format called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=MIME&amp;amp;oldid=1080291535&quot;&gt;MIME&lt;/a&gt;,
and if you send a compliant MIME message the receiver should be able
to process it, at least to figure out what the type of
the message is.&lt;/p&gt;
&lt;p&gt;Identifying the type of the message is only the first
step. Suppose that you want to introduce a new
mail feature, say &lt;a href=&quot;https://apps.apple.com/us/app/memoji/id1526384700&quot;&gt;memoji&lt;/a&gt;
in emails. Even if you write a new standard for it and Alice
adds it to her email client, what happens if Bob hasn&#39;t upgraded?
Ideally, the client would get some clear message that something
was wrong, and yet would still see the part that was
interpretable, but this doesn&#39;t always work.
Depending on exactly how the new feature is designed, it either
might not work properly—for instance, the memoji might
be replaced with  some unknown character like �—
(for a long time, emails from Outlook would &lt;a href=&quot;https://www.bleepingcomputer.com/news/microsoft/after-seven-years-microsoft-is-finally-fixing-the-j-email-bug/&quot;&gt;render
the :) emoji to &amp;quot;J&amp;quot; on non-outlook systems&lt;/a&gt;)
or the message might just not be readable at all (though hopefully
you wouldn&#39;t design a feature like that).
At the end of the day, this kind of mismatch can create
a pretty degraded experience and change the meaning of the message.&lt;/p&gt;
&lt;p&gt;The converse of this property however, is that
email &lt;em&gt;processing&lt;/em&gt; is highly extensible. Because mail formats
are open and standardized, any client that speaks the
protocol will work. I gave the example of Webmail before,
but this also means that if you want to
use a mail client which offers some new feature—automatic
email summarization say—that&#39;s your business.
By contrast, most messaging systems are closed and so
you&#39;re limited to the features supported by the official
client.&lt;/p&gt;
&lt;h3 id=&quot;security%3F&quot;&gt;Security? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#security%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Like many things on the Internet, the e-mail system was designed
before modern encryption and so initially everything was in
the clear. This allowed for a broad range of attacks:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Anyone on the connection between you and the mail server
or between mail servers could read or modify your messages.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Senders weren&#39;t authenticated and so it was trivial to
forge messages that appeared to come from someone else.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your mail server was compromised, then it could read
your messages in transit or change them.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Some of these issues have been gradually sort-of addressed
with partial solutions such as TLS encrypting the traffic
between you and the mail server, TLS encrypting
the traffic between the mail servers, and server-based
signing mechanisms like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=DomainKeys_Identified_Mail&amp;amp;oldid=1080414793&quot;&gt;DKIM&lt;/a&gt;. However, they&#39;re incompletely
applied (for instance, the client-server connection
is generally strongly authenticated but the server-server
connection often is not) and still don&#39;t provide any protection
against a malicious or compromised mail server. For that
you need &lt;em&gt;end-to-end encryption&lt;/em&gt; (E2EE), in which the
messages are encrypted (and authenticated) between the
sending and receiving endpoints.&lt;/p&gt;
&lt;p&gt;There have been quite a few attempts to provide end-to-end encryption
for e-mail (PGP, S/MIME, etc.) but I think it&#39;s fair to describe them
as having largely failed. This isn&#39;t to say that there isn&#39;t any encrypted
mail but it&#39;s a fairly small fraction of overall traffic. The
reasons for the failure of encrypted email are complicated, but
there were a number of deployment problems that most likely
contributed.&lt;/p&gt;
&lt;h4 id=&quot;key-management&quot;&gt;Key Management &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#key-management&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Like any cryptographic system, encrypted email depends on
knowing the cryptographic keys of the people you are talking to.
In e-mail, you use keys in two ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You sign your messages in order to authenticate them&lt;/li&gt;
&lt;li&gt;People who want to send you secure messages need to encrypt them to your
key.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It&#39;s technically possible to just start sending people messages with
unauthenticated
keys, for instance by signing all of your messages and expecting
people to remember that this is your key (this is often called &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Trust_on_first_use&amp;amp;oldid=1052198040&quot;&gt;trust
on first use
(TOFU)&lt;/a&gt;&lt;/em&gt;).
Once they have received a message from you, they can use your key to
encrypt the return message. Obviously, TOFU is susceptible
to attack if the that attacker is the first person to send you
a message pretending to be someone else, which makes the system
less than ideal, especially for interactions with people you don&#39;t
talk to frequently.  If my bank sends me a signed message, then I want
to know it&#39;s my bank right away. It&#39;s also a problem if you want to
send an encrypted message to someone you have never talked to
before. What you really want is some system that lets you find out
what people&#39;s keys are, which means solving two problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You need to somehow associate your key(s) with
your email address.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You need some way to look up people&#39;s keys so that
you can send them encrypted messages.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Deploying the infrastructure for both of these has proven to be
quite challenging. The basic problem is that there was
never a good way to automatically issue the credentials.
This meant that people had to go to a lot of effort to
get credentials, which of course meant that most
people didn&#39;t get them. On the other side of the equation,
there was never really a great way to discover
people&#39;s credentials, which meant that you couldn&#39;t
send encrypted email to new people. It&#39;s in principle
possible to build mechanisms for this (&lt;a href=&quot;https://datatracker.ietf.org/doc/rfc8555/&quot;&gt;ACME&lt;/a&gt;
and &lt;a href=&quot;https://webfinger.net/&quot;&gt;WebFinger&lt;/a&gt;
respectively are examples of the kind of thing I&#39;m talking
about), but we have the usual deployment
&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#network-effects&quot;&gt;network effect&lt;/a&gt; problems.&lt;/p&gt;
&lt;h4 id=&quot;confusing-semantics&quot;&gt;Confusing Semantics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#confusing-semantics&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In addition to the keying problems, the fact that email encryption was
added after the fact to an established system has resulted in some
confusing semantics.&lt;/p&gt;
&lt;p&gt;For example, the major extension point in e-mail is via the message
&lt;em&gt;body&lt;/em&gt;. As noted above, the bodies use an extensible message format
called MIME. However the message subject line isn&#39;t extensible.
This means that the subject line that appears in
the email isn&#39;t either encrypted or authenticated. It&#39;s of course
possible to have an inner subject line inside the encryption envelope,
but it&#39;s an obvious challenge for users to understand that they can
trust the body but not the subject.&lt;/p&gt;
&lt;p&gt;Second, because some messages are protected and some are not,
you need some way to indicate to the user which are which.
This kind of indicator is a notorious source of confusion,
especially in a situation where most messages are
unprotected, because you don&#39;t want a big scary warning for
nearly every message. But this also reduces the incentive for people
to use secure e-mail, especially to send signed
e-mail: if recipients don&#39;t notice or care whether
messages are signed, then signing them doesn&#39;t add
a lot of value, as an attacker can just impersonate you
with the recipient being none the wiser.&lt;/p&gt;
&lt;h4 id=&quot;network-effects&quot;&gt;Network Effects &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#network-effects&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;All of this should be a familiar story to EG readers: you
have a situation where it&#39;s inconvenient for people to do
something—in this case, deploy encryption—and
there&#39;s not much benefit to doing it. In these cases, you get the expected result which is
limited or minimal deployment. By contrast, most modern messaging systems
were either built with E2EE from the start or underwent
some mass upgrade that enabled it for everyone, rather
than relying on people to do it themselves.&lt;/p&gt;
&lt;h2 id=&quot;messaging-systems&quot;&gt;Messaging Systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#messaging-systems&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Modern messaging systems have addressed these issues by making
encryption both mandatory and automatic. This is comparatively
easy because the messaging service is (usually) vertically integrated:
all—or nearly all—users have clients which are provided
by the service operator and can be updated as desired. The
service operator also provides message routing and identity.
This kind of uniform integrated system has a number of operational
advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The service can automatically issue credentials based on the
user&#39;s account information, thus ensuring that every user
has a credential. They can also run a directory which makes
it easy for any client to learn the credentials for every
other client.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the service wants to add a new feature it can automatically
upgrade everyone&#39;s client to support it. This means that they
don&#39;t need to deal with massive heterogeneity of client functionality
for very long, and can eventually just refuse to support older
clients.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Spam and other kinds of abuse are easier to handle because
all messages are authenticated by a user in the system. Of
course, if you have a single central point where all
messages are handled, and no end-to-end encryption, then content
filtering is more difficult.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, many of these advantages depend on having a closed system:
if a significant fraction of people use third party clients to talk to
such a system then you can no longer update the clients whenever
you want to, which makes central extensibility much more difficult.
In other words, you&#39;re trading off user control and extensibility for users
for control and extensibility by the system operator. This is in
stark contrast to the design of the Web, which is dominated by
the principle of end-user control as documented in
the &lt;a href=&quot;https://www.w3.org/TR/html-design-principles/#priority-of-constituencies&quot;&gt;HTML Priority of Constituencies&lt;/a&gt;
and the &lt;a href=&quot;https://webvision.mozilla.org/full/#usercontrol&quot;&gt;Mozilla Web Vision&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Another consequence of a closed system is a lack of universal connectivity:
with e-mail—or telephony—you can contact anyone no matter
which service provider they are on. In fact, you don&#39;t even have to
think about it: you just e-mail (or dial). Messaging, however, is different:
if I want to send a message to someone on WhatsApp, I need to have
a WhatsApp account myself. And because people choose different messaging
systems, this means that it&#39;s now common to have accounts on a variety
of messaging systems (I myself use three regular messaging systems, plus
countless Slacks).&lt;/p&gt;
&lt;p&gt;All of this creates a set of market dynamics dominated by network
effects
(&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Metcalfe%27s_law&amp;amp;oldid=1071685522&quot;&gt;Metcalfe&#39;s Law&lt;/a&gt;)
and getting big: if you have a lot of users, then people have
a strong incentive to join so they can talk to their friends. Conversely,
if you are a new entrant into the market it is hard to break in
because your early users don&#39;t have that many people to talk to.
This is probably why we see a lot of regional variation in which
apps are popular, because people want to use whatever app their
friends use. Unsurprisingly, this produces some fairly lopsided
market numbers, with Meta controlling two of the top three
messaging platforms (WhatsApp and Facebook Messenger):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://www.messengerpeople.com/wp-content/uploads/2021/05/most-popular-global-mobile-messaging-apps-2021.png&quot; alt=&quot;Messaging platforms&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This brings us to the topic of interoperability: if it were possible
for anyone to start a new messenger app that could still talk to
WhatsApp and Messenger users, then this would remove a big barrier
to entry into the market. I don&#39;t want to sound too optimistic here:
even in a nominally open system like e-mail, we still see a huge
&lt;a href=&quot;https://blog.shuttlecloud.com/the-most-popular-email-providers-in-the-u-s-a/&quot;&gt;amount of market concentration&lt;/a&gt;
on the big mail systems like Gmail, Outlook, and Yahoo. This isn&#39;t
too surprising: it&#39;s a lot of work to run a good mail system
and so we&#39;d expect well-funded players to dominate. However,
it&#39;s also quite possible to use one of the smaller services
like &lt;a href=&quot;https://www.fastmail.com/&quot;&gt;Fastmail&lt;/a&gt;, &lt;a href=&quot;https://protonmail.com/&quot;&gt;ProtonMail&lt;/a&gt;,
or &lt;a href=&quot;https://www.dreamhost.com/&quot;&gt;DreamHost&lt;/a&gt; or even run your own server,
whereas there&#39;s really no way to run your own WhatsApp server.&lt;/p&gt;
&lt;h2 id=&quot;technical-interoperability-for-messenging&quot;&gt;Technical Interoperability for Messenging &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#technical-interoperability-for-messenging&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The details of what the DMA will actually require are extraordinarily
sketchy; as I understand it they would need to be filled out
by some regulatory agency. However, broadly speaking, there seem to be two options for providing
interoperability, as &lt;a href=&quot;https://www.internetsociety.org/wp-content/uploads/2022/03/ISOC-EU-DMA-interoperability-encrypted-messaging-20220311.pdf&quot;&gt;laid out by ISOC&lt;/a&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Require services to offer stable APIs.&lt;/li&gt;
&lt;li&gt;Require services to actually interoperate over a standardized
protocol.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These require a bit of unpacking.&lt;/p&gt;
&lt;h3 id=&quot;stable-apis&quot;&gt;Stable APIs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#stable-apis&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The idea behind a stable API is that the service would design and publish interfaces
that others could use. There are actually two ways to offer stable APIs:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;To &lt;em&gt;clients&lt;/em&gt;, allowing someone else&#39;s messenger
client to work with your service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To &lt;em&gt;services&lt;/em&gt;, allowing someone else&#39;s messenger service to gateway
messages in and out of your service.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first of this is actually a familiar concept in instant
messaging: because there was never a single standardized protocol,
it was fairly common to have messaging clients, such as
&lt;a href=&quot;https://www.trillian.im/&quot;&gt;Trillian&lt;/a&gt;,
which would speak multiple protocols but provide a unified interface
to the user that hid the details. This isn&#39;t really a conceptual
change in the architecture of the system as it would still be
a monolithic identifier space and the clients would still have
to conform to whatever rules the service laid out; indeed, some
services have open source clients, and so this is already possible
for them, though of course third party clients might not
get upgraded when the official clients do, potentially
resulting in stability problems.
The main result would be some decreased flexibility
for the service because they would need to get users of the API
to update when they wanted to change something that affected
interoperability. However, as a practical matter, this probably
wouldn&#39;t have that much of an impact on interoperability
and market concentration because most people will just use the
official client, and people who don&#39;t will be annoyed when
the service changes something and breaks them.&lt;/p&gt;
&lt;p&gt;The second version is less familiar, but the idea is presumably that
WhatsApp would have some published API that would allow
ekrMessage (TM pending!) to gateway messages into and out of
WhatsApp. As with e-mail, each side would handle messages
according to its own rules, with the gateway just
transiting messages between the systems.
This comes with two main problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;How do you handle identities? For instance, if ekrMessage
and WhatsApp both use phone numbers for identities, how
do you know which messages stay on WhatsApp and which go
to ekrMessage?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;How do you manage different encryption protocols? Currently,
each messenger has their own encryption protocol; while many
of these are built along similar lines, they&#39;re not necessarily
identical. Making this work either requires gatewaying at
the provider—thus breaking end-to-end encryption, which
is extremely undesirable from a security perspective—or
having each client speak multiple encryption protocols,
as in the multi-protocol client case.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, this would all be a lot easier if there was some
standardized protocol that everyone spoke, as with e-mail.
Note: the difference between a stable API and a standardized protocol isn&#39;t
really technical so much as social and depends on whether there
is some standard or just a document published by the service.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;standardized-protocol&quot;&gt;Standardized Protocol &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#standardized-protocol&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Having a standardized protocol is not an
all-or-nothing proposition: there are actually a number of levels at which one might
have standardization, with the other levels potentially not
being standardized:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Key establishment and message encryption&lt;/li&gt;
&lt;li&gt;Use identity&lt;/li&gt;
&lt;li&gt;Message transport&lt;/li&gt;
&lt;li&gt;Message contents and features&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I go into these in some more detail below.&lt;/p&gt;
&lt;h4 id=&quot;key-establishment-and-message-encryption&quot;&gt;Key Establishment and Message Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#key-establishment-and-message-encryption&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The basic structure of most messaging encryption systems is that
you have an identity (e.g., your phone number) which is tied
to a cryptographic key or keys. When Alice and Bob want to exchange messages,
there is some protocol that lets them use their keys to establish a pairwise
(or groupwise in the case of more than two people) cryptographic
key which they then use to encrypt messages.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Obviously, if Alice and Bob don&#39;t speak the same protocol,
then they will not be able to establish pairwise keys and will
not be able to encrypt messages end-to-end, so this is probably
the most important place for everyone to use a common protocol.&lt;/p&gt;
&lt;p&gt;Fortunately, while there are technical differences between the various
protocols in use, they&#39;re similar enough that it would
probably not be prohibitive for everyone to converge on
a common protocol: a number of the existing messenging
systems are based on the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Signal_Protocol&amp;amp;oldid=1062140450&quot;&gt;Signal protocol&lt;/a&gt;
or one of its variants such such as &lt;a href=&quot;https://wire.com/en/blog/axolotl-proteus-encryption-protocols/&quot;&gt;Proteus&lt;/a&gt;
or &lt;a href=&quot;https://gitlab.matrix.org/matrix-org/olm/blob/master/docs/megolm.md&quot;&gt;Megolm&lt;/a&gt;,
and the IETF is currently in the final stages of standardizing
a protocol called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Messaging_Layer_Security&amp;amp;oldid=1076231420&quot;&gt;Messaging Layer Security (MLS)&lt;/a&gt; which contains a number of similar concepts but is
intended to be more optimized for group communication. It&#39;s too
soon to know how much adoption MLS will get, but the WG has
had participation from a number of messenging services such as
Facebook Messenger, Matrix, Wickr, and Wire (full disclosure: I
have also been heavily involved in this effort). It would be a big
lift for companies to change out their protocols, but, because
right now they&#39;re noninteroperable silos, it&#39;s still
technically feasible.&lt;/p&gt;
&lt;h4 id=&quot;identity&quot;&gt;Identity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#identity&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As I said above, we need to have some notion of user identity. Identity
is used for two purposes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;By the end-user clients (in an end-to-end system) to
establish the keys to use to encrypt a message.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;By the service to know how to route messages.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both of these require identifying other people you want
to exchange messages with.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;imessage&quot;&gt;iMessage &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#imessage&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;iMessage is actually quite an interesting case because the
Apple client is actually two clients in one, containing
both an SMS client for talking to non-Apple users (the
green bubble) and
an iMessage client for talking to Apple users (the blue
bubble). iMessages are sent over the Internet (&amp;quot;over the top&amp;quot;) and are
end-to-end encrypted. SMS messages are sent over the
phone network and are not. However, both categories
of users have the same type of addresses in the form
of phone numbers iMessage (which also supports
email addresses) and Apple automatically detects the
capabilities of the message recipient and sends a message
of the appropriate type.&lt;/p&gt;
&lt;p&gt;iMessage might be one of the strongest cases for the benefits
of interoperability because it already &lt;em&gt;interoperates&lt;/em&gt;
with Android devices, just in the clear over SMS. If iMessage
was forced to interoperate and Android played along, then
a large fraction of traffic would suddenly be encrypted.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;At a high level, there are two main identity architectures we can have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Hierarchical naming in which a given identity indicates
which service it is attached to, as in e-mail.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A shared namespace in which a given identity could be
attached to any service (like phone numbers).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With messaging, the situation is even more complicated because
multiple messaging services use the same identifier (e.g.,
WhatsApp and iMessage both use phone numbers) so that means
that even in an interoperable system, we&#39;d need to find some way
to manage that case, which seems like a real open question
(though of course we already have that problem now when you
tell someone &amp;quot;I&#39;m 1.415.555.1111 on WhatsApp&amp;quot;, so in the
worst case scenario, we could just punt the problem to the user.)
We also have the potential problem that &lt;code&gt;alice&lt;/code&gt; on
system A may be a different person from &lt;code&gt;alice&lt;/code&gt; on system B;
this shouldn&#39;t happen with phone numbers because they are uniquely
assigned but it happens all the time with user-chosen handles.&lt;/p&gt;
&lt;p&gt;The hierarchical design is obviously easier to manage, but it
may be quite hard to retrofit to the existing non-hierarchical
system.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
One possible approach is to have a hierarchical system under
the hood but have UIs present unqualified namespaces,
e.g., &amp;quot;Connect with 1.415.555.1111 on WhatsApp&amp;quot; in the UI
turns into &amp;quot;Connect with &lt;code&gt;1.415.555.1111@whatsapp.com&lt;/code&gt; at
the protocol layer.&amp;quot;
This is likely to work OK if there are a small number of
messaging systems but less well if there are hundreds
because the UI gets too cluttered. It&#39;s also possible to have a kind
of hybrid UI like existing e-mail systems do for there
accounts where you have a chooser for the common systems
and then people can enter something freeform:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/mail-chooser.png&quot; alt=&quot;Email account chooser&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This brings us to the question of how users learn other
users keying material.
In a fully distributed/federated world like e-mail, you&#39;d need
some sort of analog to the WebPKI in which there was a set of
agreed up on roots of trust and those roots then somehow were
able to attest to identities in a uniform manner, no matter
which messaging service people used. This in contrast to the
current situation where each service runs its own disconnected
identity service. If there
is a totally shared namespace, then this has a lot of the same
problems as the WebPKI in which anyone can attest to any name,
but if the names are arranged hierarchically—even if
that&#39;s not visible to the user—then we could potentially
dodge some of those problems, as only WhatsApp would be able
to attest to names for &lt;code&gt;@whatsapp.com&lt;/code&gt;, etc.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s also possible that one could do something less universal:
if there are only a modest number of messaging services, and you
have to make special arrangements to federate between services,
then each service could continue to maintain its own identity
system and just publish documentation about how it
works, forcing the other systems could implement
that. The likely outcome here would be that the big gatekeeper
systems would each have something and if you wanted to talk
to them, you would need to both consume and publish that, which
is a burden on the smaller systems, but perhaps a bearable one
(the tricky part is when Alice has accounts on WhatsApp and iMessage
and wants to talk to someone on ekrMessage: which credentials
does she use for the ekrMessage user?).&lt;/p&gt;
&lt;h4 id=&quot;message-transport&quot;&gt;Message Transport &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#message-transport&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Once we have established keys and are sending messages, we still need some
way to transport them. There have been attempts to design standardized
protocols for this, in particular &lt;a href=&quot;https://xmpp.org/&quot;&gt;XMPP&lt;/a&gt;
and &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=SIMPLE_(instant_messaging_protocol)&amp;amp;oldid=1074023895&quot;&gt;SIMPLE (which is not)&lt;/a&gt;,
but neither has seen the kind of adoption that would make it the
obvious choice here.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;As with identity, while it would be convenient to offer something standardized,
it&#39;s probably not a dealbreaker not to have it, as long as services
are required to offer interoperable APIs for message sending
and delivery. The good news here is that unlike the cryptographic
pieces, those APIs can largely be handled by the messaging
service, rather than the client, so my ekrMessage client just
needs to know that a given message is destined for someone on
WhatsApp and it can route it there.&lt;/p&gt;
&lt;h4 id=&quot;message-contents-and-features&quot;&gt;Message Contents and Features &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#message-contents-and-features&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;All of the above is just concerned with getting messages from point
A to point B, but what people actually care about is the messages
themselves. In order for messaging to work properly, when the
messages finally get to the recipient, they need to be readable,
which won&#39;t work if (say) system A uses ASCII messages
and system B encodes them as images. Moreover, if system B
wants to add some new feature, it&#39;s a problem if system
A doesn&#39;t have it (&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#critiques&quot;&gt;critique 2&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;As noted above, this is a sort-of solved problem in e-mail
in that you can send MIME-encoded messages that describe their
contents. But of course, describing the contents doesn&#39;t
help if someone sends me a message of type &lt;code&gt;image/avif&lt;/code&gt;
and I don&#39;t know how to parse that. The conventional solution
here is to have
some common format that it&#39;s assumed that everyone can read
(in e-mail this is 7-bit ASCII text). The sender then sends
&lt;em&gt;two&lt;/em&gt; copies of the content bundled in the same message: (1) the &amp;quot;basic&amp;quot;
version that everyone should be able to read and (2) the &amp;quot;enhanced&amp;quot;
version that only newer clients can read.&lt;/p&gt;
&lt;p&gt;This is a workable, if not ideal, solution, but actually it&#39;s
probably possible to do quite a bit better. The reason is that
unlike e-mail, where you send messages to people based
solely on their address, in order to send someone an encrypted
message you need their key. When people publish their keys then
can also publish other capabilities such as the various media
types they understand, which gives senders some information about
what messages are safe to send (Rohan Mahy has
described such a &lt;a href=&quot;https://www.ietf.org/id/draft-mahy-mls-content-neg-00.html&quot;&gt;mechanism&lt;/a&gt;
for MLS.)
Unfortunately, it&#39;s still possible to get into trouble with
larger groups with mixed capabilities, where you probably
end up having to send a lowest common denominator version.
This isn&#39;t ideal for ordinary features, but is potentially
more problematic for security features, as discussed below.&lt;/p&gt;
&lt;p&gt;As should be clear from the discussion above, any form of
interoperability places some limits on the freedom of each service to
change their offerings whenever they want. Some of these
costs—like using a standardized encryption protocol—are
relatively modest, but others may be larger. It&#39;s certainly a lot more
work to detect the capabilities of every client and carefully craft
messages which will work for all of them than it is to just generate
messages for one client type which you know works.&lt;/p&gt;
&lt;h2 id=&quot;security-implications-of-interoperability&quot;&gt;Security Implications of Interoperability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#security-implications-of-interoperability&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As discussed above, if connecting service A and
service B requires some kind of bridge that decrypts and reencrypts
messages, then this has a pretty negative impact on security (&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#critiques&quot;&gt;critique 1&lt;/a&gt;).
However, it&#39;s also possible to have interoperable end-to-end encryption;
I would also argue that with sufficient care it&#39;s even possible to design
an identity infrastructure that doesn&#39;t badly weaken the system as
a whole. However, that isn&#39;t to say that there are no security
implications of requiring interoperability.&lt;/p&gt;
&lt;p&gt;First, even if you have a common protocol, there may be differences
in application semantics. For example, when WhatsApp detects
that a recipient has changed their keys and so a message is
undecryptable, it &lt;a href=&quot;https://www.schneier.com/blog/archives/2017/01/whatsapp_securi.html&quot;&gt;automatically re-sends the message&lt;/a&gt;.
This is a usability feature but is a difference from Signal, which
does not automatically re-send—even though they use the same protocol as WhatsApp—because Signal is concerned that the new key might be compromised. This is an application
behavior and it&#39;s of course
harder to frame the security guarantees of a system where there
is more than kind of client; in this case, the security decision
is made by the sender, but in other cases it might not be.&lt;/p&gt;
&lt;p&gt;One case where that&#39;s so is that messaging systems
support &amp;quot;disappearing messages&amp;quot; which get automatically deleted
after a certain time. This is not a cryptographic feature but
rather a client side feature and depends on the receiving client
complying with the sender&#39;s request to delete the message. Obviously,
if the remote client doesn&#39;t comply, then it&#39;s not going to work.
I&#39;m less sympathetic to this case because this kind of feature
is mostly an example of hope-based security: even in a closed
system you have &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software&quot;&gt;no way of knowing what software is running on the
receiver&#39;s computer&lt;/a&gt;; it could have been
hacked or they could have reverse-engineered non-compliant
system (the virtue of standards is that they allow for
interoperability without reverse engineering).
Even if that&#39;s not the case, nothing stops them from
taking a photo of the screen, or, depending on the system,
a screenshot. This seems like a case where the recipient can
advertise its capabilities and you just have to trust them.&lt;/p&gt;
&lt;p&gt;There might also be new security features that would not
end up in whatever new standardized protocol was settled on,
such as metadata protection or post-quantum security. This isn&#39;t
ideal, of course, but standardized protocols do evolve, and it&#39;s
possible for messaging services to use private protocol extensions
for groups that just consist of their users on new clients, so
this doesn&#39;t seem like a fatal objection.&lt;/p&gt;
&lt;p&gt;Probably the most serious problem is spam and abuse (&lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#critiques&quot;&gt;critique 3&lt;/a&gt;). As I
mentioned earlier, this is a much easier problem if you
have relationships with all the users and don&#39;t need to
accept messages from arbitrary counterparties. End-to-end
encryption also presents a problem here because it means
you can&#39;t do content filtering centrally. I&#39;m not sure how serious
this would actually be in practice: a lot of what makes
email spam work is that you have to accept email from
non-contacts, which is somewhat less of an issue in
messaging systems, but this still seems like a
problem that needs more work.&lt;/p&gt;
&lt;h2 id=&quot;critique-recap&quot;&gt;Critique Recap &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#critique-recap&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s probably useful to recap the critiques from the &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#critiques&quot;&gt;beginning&lt;/a&gt; of
this post. I don&#39;t think they are entirely without merit, but I also believe
that interoperability would have real benefits that need to be weighed
against these concerns.&lt;/p&gt;
&lt;h4 id=&quot;interoperability-will-weaken-security&quot;&gt;Interoperability will weaken security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#interoperability-will-weaken-security&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s certainly true that there are ways to implement interoperability
which would have a very negative impact on security. However, as I
argue above, I think it&#39;s also possible to implement interoperability
in ways which would minimize those impacts, in particularly by maintaining
end-to-end encryption across system boundaries. Clearly, the resulting
system would be more complex, which is bad for security, but having
a common system would provide a single target for analysis and improvement,
which is good.&lt;/p&gt;
&lt;p&gt;It&#39;s also important to look at the non-technical picture here: right now users
largely choose their messaging systems based on who they want to talk to
and get whatever security properties those systems have. Interoperability
would allow people to choose systems based on security properties—for
instance that they have &lt;a href=&quot;https://github.com/google/keytransparency/&quot;&gt;key transparency&lt;/a&gt;
and &lt;a href=&quot;https://reproducible-builds.org/&quot;&gt;reproducible builds&lt;/a&gt;—while
still talking to people who have made other choices. Of course, those
mixed conversations tend to have the security properties of the weaker
system, but at least it would be easy to also talk to people who had
made stronger choices. In addition, we see many cases today where people use
back to unencrypted channels in order to interoperate (e.g., iMessage falling back to SMS),
which would be improved by end-to-end interoperability.&lt;/p&gt;
&lt;h4 id=&quot;interoperability-will-hold-back-innovation&quot;&gt;Interoperability will hold back innovation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#interoperability-will-hold-back-innovation&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Here too, the situation is complicated. On the one hand, it&#39;s clearly true that
messaging services would be less free to innovate than if they were totally
vertically integrated (although they would still retain substantial freedom).
On the other hand, there would be more room for innovation on the clients
themselves, something which is currently very difficult. It&#39;s worth noting
that the Web is one giant mostly interoperable system which is still
experiencing plenty of innovation, so I don&#39;t think it&#39;s a foregone
conclusion that interoperable systems can&#39;t innovate; you just need
mechanisms to manage compatibility and change.&lt;/p&gt;
&lt;h4 id=&quot;interoperability-will-make-abuse-worse&quot;&gt;Interoperability will make abuse worse &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#interoperability-will-make-abuse-worse&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It does seem likely that interoperability will make abuse worse: if you
have to accept messages from basically anyone then reputation and
similar systems become harder, and e-mail abuse (especially spam) is
a serious problem. However, we already see abuse even in monolithic systems,
so it&#39;s also clear that being closed isn&#39;t a panacea.
Moreover, messaging is fundamentally different from e-mail in a number
of important ways (we&#39;ll have authentication from the start, which
was a huge problem in e-mail, there is much less expectation that you&#39;ll
just accept messages from anyone, etc.) so it&#39;s not clear how much
worse interoperability will make things.&lt;/p&gt;
&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As the extremely long writeup above should indicate, this is far
from an easy problem. We have a giant installed base of software
that doesn&#39;t interoperate and changing that would be difficult
even if the big players wanted to. Famously, Facebook
has been &lt;a href=&quot;https://screenrant.com/whatsapp-cross-chat-facebook-messenger-instagram-optional-interoperability/&quot;&gt;trying to get Messenger and WhatsApp to interoperate
in an end-to-end secure fashion for years&lt;/a&gt;,
and it seems likely that they&#39;re going to be a lot less excited about
interoperating with others. However, that&#39;s separate question
from whether it&#39;s actually technically possible to do, which,
as the analysis above suggests, I think it is.
With that said, this is also a much harder problem than
the EU guidelines seem to contemplate: for instance,
they require that basic 1-1 messaging be
available within three months, and group messaging within
two years. Given that the MLS standardization process
is just about complete after &lt;a href=&quot;https://datatracker.ietf.org/wg/mls/history/&quot;&gt;four years&lt;/a&gt;,
two years seems pretty aggressive, and three months seems
fairly implausible.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that email frequently has &lt;em&gt;transport&lt;/em&gt; encryption where
messages are encrypted between users and mail servers
and between mail servers, but they are generally in the
clear on the mail server. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;What it looks up is
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=MX_record&amp;amp;oldid=1037761196&quot;&gt;mail exchanger (MX) record&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
And Alice doesn&#39;t even need to know that much. For instance,
if Gmail suddenly decided to support domains
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/&quot;&gt;rooted in the blockchain&lt;/a&gt;,
this would just work transparently for Alice, because
only Gmail needs to know which server handles &lt;code&gt;example.eth&lt;/code&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Of course, users don&#39;t always upgrade instantaneously, so it&#39;s
possible to have some heterogeneity, but it&#39;s typically fairly
short term, especially because the service provider can
force you to update to continue using the service. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note: The difference between
&amp;quot;APIs&amp;quot; and &amp;quot;protocols&amp;quot; is largely a matter of terminology:
protocols are just the rules for what go over the network,
but things that run over HTTP are often called &amp;quot;APIs&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In many protocols, that pairwise key is itself changed
(&amp;quot;ratcheted&amp;quot;) frequently. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, am I just the only person who thinks that
the proliferation of these non-hierarchical namespaces
is a huge regression? I&#39;d much rather be &lt;code&gt;ekr@rtfm.com&lt;/code&gt;
everywhere than &lt;code&gt;ekr&lt;/code&gt; on Github and &lt;code&gt;ekr____&lt;/code&gt; on
Twitter. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There
are also questions about key transparency and the like,
but they&#39;re largely downstream of these bigger architectural
questions. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Google chat used to offer an XMPP interface but no longer does. &lt;a href=&quot;https://educatedguesswork.org/posts/messaging-e2e/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>What&#39;s with the www prefix in www.example.com?</title>
		<link href="https://educatedguesswork.org/posts/www-prefix/"/>
		<updated>2022-03-28T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/www-prefix/</id>
		<content type="html">&lt;p&gt;You might have noticed that it&#39;s common for sites to have a domain
name like &lt;code&gt;www.example.com&lt;/code&gt; and a URL like
&lt;code&gt;https://www.example.com&lt;/code&gt;. You might wonder what the
&lt;code&gt;www&lt;/code&gt; is doing here. You&#39;re most likely loading this from a Web browser,
so surely the browser knows you&#39;re on the Web. Why does it
need the &lt;code&gt;www&lt;/code&gt; prefix? The answer, like many things on the
Internet, is that it was the quickest way to get to a
result without having to change anything and now we&#39;re at
a local minimum which is hard to change.&lt;/p&gt;
&lt;h3 id=&quot;protocol-separation&quot;&gt;Protocol Separation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/www-prefix/#protocol-separation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In the early days of the Internet, it seemed like sites would
be running a number of user-facing services (email, Web, gopher,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_News_Transfer_Protocol&amp;amp;oldid=1071621299&quot;&gt;NNTP&lt;/a&gt;,
etc.) It quickly became apparent that even though it was
technically possible to multiplex them on different &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Port_(computer_networking)&amp;amp;oldid=1072747579&quot;&gt;TCP ports&lt;/a&gt;, you didn&#39;t actually
want to run them all on the same machine, for several
reasons.&lt;/p&gt;
&lt;p&gt;First, you may not want them to be managed by the same person. The bigger
your system gets, the more you want division of labor, and, for instance,
you might not want your mail administrator to have access to your Web
server.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/www-prefix/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
Second,
you might want to use multiple machines to manage load, initially by
separating each service onto its own machine and then potentially
later by having multiple Web servers. Load is generally
more of an issue for Web than it is for other services, principally
because it&#39;s possible to get &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Slashdot_effect&amp;amp;oldid=1070282247&quot;&gt;flash crowds&lt;/a&gt;
that suddenly dramatically increase the load on your Web server.
For obvious reasons, you don&#39;t want a flash crowd that slows
your Web server to a crawl to also bring down your mail server,
which you may be using to coordinate fixing your Web server.&lt;/p&gt;
&lt;p&gt;Unfortunately, in those early days, the DNS had no way to say that if you had
the name &lt;code&gt;example.com&lt;/code&gt; you should connect to machine A for Web and
machine B for NNTP.  Recall from an &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/&quot;&gt;earlier
post&lt;/a&gt; that a domain
name is just an index into a distributed database, with the primary
value in the database being the IP address associated with the name.
This means that Web and NNTP for &lt;code&gt;example.com&lt;/code&gt; have to point to
the same IP address and hence the same machine. As you have
probably guessed by now, the solution is to give each service
a different domain name, e.g., &lt;code&gt;www.example.com&lt;/code&gt; for Web,
&lt;code&gt;nntp.example.com&lt;/code&gt; for NNTP, etc. This allows you to configure
a separate machine for each service with its own IP address.
This also allows them to
be in totally different data centers or even operated by
different hosting providers.&lt;/p&gt;
&lt;p&gt;Interestingly, it &lt;em&gt;was&lt;/em&gt; possible to say that you should deliver mail
for (say) &lt;code&gt;example.com&lt;/code&gt; to &lt;code&gt;mail.mailserver.example&lt;/code&gt; via
something called an &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=MX_record&amp;amp;oldid=1037761196&quot;&gt;MX
record&lt;/a&gt;;
this allowed someone else to run a mailserver on your behalf. However,
there was no generic mechanism to do so for other protocols.
There are now several such mechanisms, starting with the
the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=SRV_record&amp;amp;oldid=1072199092&quot;&gt;SRV record&lt;/a&gt;
and now including the &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-dnsop-svcb-https-03.html&quot;&gt;HTTPS record&lt;/a&gt;.
However, the SRV record never got wide deployment—to the best of my
knowledge, no browser supports it—and the HTTPS record is new.
The problem with deploying any such record is that there are a significant
number of browsers which don&#39;t support it, so if you want to
steer Web traffic and other traffic to different places, you need
to keep doing &lt;code&gt;www&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;cname-and-the-apex-zone&quot;&gt;CNAME and the Apex Zone &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/www-prefix/#cname-and-the-apex-zone&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Of course, at this point, there are mostly only two domain
names that users regularly come into:
email (e.g., &lt;code&gt;ekr@example.com&lt;/code&gt;) and Web (&lt;code&gt;https://example.com&lt;/code&gt;).
As I mentioned above, it &lt;em&gt;is&lt;/em&gt; possible to run email and
Web on different machines without the &lt;code&gt;www&lt;/code&gt; prefix. So, why
does the prefix persist?&lt;/p&gt;
&lt;p&gt;In part this is just inertia, but it&#39;s also partly a result of another
shortcoming of the DNS which is that it&#39;s not possible to have a
&lt;a href=&quot;https://www.isc.org/blogs/cname-at-the-apex-of-a-zone/&quot;&gt;CNAME at the apex of a zone&lt;/a&gt;.
Suppose that I want to have my web site hosted by &lt;code&gt;cdn.example&lt;/code&gt;. The
natural way to do this is with a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=CNAME_record&amp;amp;oldid=1068715692&quot;&gt;CNAME record&lt;/a&gt;, which is basically
an indication that the real (canonical) name of a domain is what&#39;s
in the record. So, for instance, consider the following CNAME record:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;www.example.com -&amp;gt; www.example.com.cdn.example
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This would tell anyone that if they wanted to know about
&lt;code&gt;www.example.com&lt;/code&gt; they should go look up the records for
&lt;code&gt;www.example.com.cdn.example&lt;/code&gt;. This works well because
it means I don&#39;t need to know anything about how the CDN&#39;s
network is laid out or what IP addresses they have for their
machines. I just set up the CNAME and then the CDN can
have the name resolve to whatever IP address(es) they want.
This allows them, for instance, to provide different answers
based on load or where clients are geographically.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/www-prefix/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
You can also use a CNAME to point to a service
like &lt;a href=&quot;https://www.citrix.com/products/citrix-intelligent-traffic-management/&quot;&gt;Cedexis (now Citrix)&lt;/a&gt;
which will steer traffic to different CDNs depending on network
conditions.
Unfortunately, while you can use a CNAME for &lt;code&gt;www.example.com&lt;/code&gt;,
you can&#39;t use it for &lt;code&gt;example.com&lt;/code&gt;. The reason is that a CNAME
is an all or nothing proposition: it means &amp;quot;look over here for
every record&amp;quot; and because you also need to
have NS records (as well as probably MX records) for the &lt;code&gt;example.com&lt;/code&gt;,
if you CNAME &lt;code&gt;example.com&lt;/code&gt; and you just said &amp;quot;look over here for the
name server for &lt;code&gt;example.com&lt;/code&gt;, now you&#39;ve created a circular
dependency because how do people look up the name server (the NS record)
that they need to look up the CNAME?&lt;/p&gt;
&lt;p&gt;The result of all this is if you you want to host your Web site on a
CDN and you want it to have a &lt;code&gt;www&lt;/code&gt; (or some other) prefix, you
have two main choices:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Host your own DNS and populate your records with the CDN&#39;s IP address
(this is what I do).&lt;/li&gt;
&lt;li&gt;Have the CDN host your DNS, so that they can then resolve the
actual IP address however they please.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Neither of these is ideal. If you host your own DNS, you have more
control but it&#39;s brittle because the CDN has to maintain a stable
IP for your domain. If they decide to move things then your site
breaks. It also means they can&#39;t do DNS-based load distribution.&lt;/p&gt;
&lt;p&gt;It&#39;s generally a better idea in this case to have the CDN host your
DNS, as then they can control how any given name resolve. Of course,
if they don&#39;t also host your email, you&#39;ll need to populate the domain
with MX records for your email server, but most anyone who hosts DNS
will allow this. Of course, this is only a partial solution because as
far as I can tell you still can&#39;t use a traffic management service to
steer between CDNs. As I understand that, if you want to do that, you
need to have some prefix (like &lt;code&gt;www.&lt;/code&gt;) in front of your domain.&lt;/p&gt;
&lt;p&gt;One way to try to split the difference here is to serve a page
on &lt;code&gt;example.com&lt;/code&gt; but then have most of your content on
&lt;code&gt;cdn.example.com&lt;/code&gt;, which can be load balanced invisibly. You can
also redirect users from &lt;code&gt;example.com&lt;/code&gt; to &lt;code&gt;www.example.com,&lt;/code&gt;
which isn&#39;t as invisible but lets you load balance even more
because (1) the redirect is a short message and (2) you can
tell the browser to remember the redirection, thus saving
the trip to &lt;code&gt;example.com&lt;/code&gt; in the future.&lt;/p&gt;
&lt;p&gt;One more thing: because the the HTTPS record is needed for &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-tls-esni&quot;&gt;Encrypted
Client
Hello&lt;/a&gt;
we should expect to see browsers support it for that reason, and so
there should eventually be a fair amount of HTTPS record support,
though it won&#39;t be universal.
Sites will then be able to use a HTTPS record to steer modern
browsers (those that support HTTPS) to something that can be load balanced.
Of course, older browsers will just go to whatever non-load balanced
site &lt;code&gt;example.com&lt;/code&gt; is served off of
but that will be an increasingly small fraction, so you&#39;ll
still get a fair amount of value.&lt;/p&gt;
&lt;h3 id=&quot;final-thoughts&quot;&gt;Final Thoughts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/www-prefix/#final-thoughts&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The lesson here is the same as for most features on the Internet: if
you want people to deploy something, then it has to be incrementally
deployable and provide value with low levels of deployment. If your
solution doesn&#39;t have this, then people will find some solution that
does. And that, kids, is why we have &lt;code&gt;www.example.com&lt;/code&gt;.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Of course, having access to your mail server is often enough
to get a certificate for your Web server, but we just won&#39;t
talk about that. &lt;a href=&quot;https://educatedguesswork.org/posts/www-prefix/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though it&#39;s also reasonably common to use
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Anycast&amp;amp;oldid=1049464901&quot;&gt;anycast&lt;/a&gt;
for this purpose, in which case there will just be one
IP address and BGP will be used for this kind of
traffic management. &lt;a href=&quot;https://educatedguesswork.org/posts/www-prefix/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding The Web Security Model, Part III: Basic Principles and the Origin Concept</title>
		<link href="https://educatedguesswork.org/posts/web-security-model-origin/"/>
		<updated>2022-03-21T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web-security-model-origin/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;Note: This is one of those posts that is going to be best read on
the Web, especially if you read your email using Gmail or the like,
as it will tend to mangle some of the HTML features.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is Part III of my series on the Web security model (see parts
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1&quot;&gt;I&lt;/a&gt; and
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2&quot;&gt;II&lt;/a&gt; for background on how the Web
works). In this part, I cover the primary unit of Web security,
the &lt;em&gt;origin&lt;/em&gt; and some of its implications.&lt;/p&gt;
&lt;h3 id=&quot;the-web-security-guarantee&quot;&gt;The Web Security Guarantee &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#the-web-security-guarantee&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Unlike applications or e-books, the experience of using the Web is not
confined to content provided by one vendor. Instead, even if you start
on one site, many of your activities on that site will take you to
other sites. Consider, for instance, the experience of searching for
something using Google. Once you execute the search, Google then gives
you a set of links, many of which take you to another site.  Google&#39;s
relationship to those sites is arms-length at best: it doesn&#39;t control
them and doesn&#39;t bear any responsibility for their content beyond some
vague assertion that this might be something that was responsive to
your search. The situation is the same for other big content platforms like
Facebook and Twitter: just because you see some link there doesn&#39;t
mean that the site endorses it.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;the-web-vs.-internet-threat-models&quot;&gt;The Web vs. Internet Threat Models &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#the-web-vs.-internet-threat-models&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc3552&quot;&gt;RFC 3352 (self-citation alert)&lt;/a&gt;
defines a threat model in which the attacker has complete control of
the network, which means that they can read or modify any packet.
In this case, it is trivial for them to look at any unencrypted
traffic or impersonate any site the client is making an unencrypted
connection to. Because this kind of network attack is so powerful,
it renders most questions about the Web security
model more or less superfluous: if the attacker can intercept your connection
to the site, it doesn&#39;t much matter whether there is some way that
some other site can mount a weaker attack.&lt;/p&gt;
&lt;p&gt;However, although powerful network attackers are reasonably
common—just open your browser using Airport WiFi—there
are also many weaker attackers. It used to be common to talk about
the Web threat model in which we assume that the attacker has
their own site that they can get you to talk to but is unable
to interfere with your connections to legitimate sites. Due to the
complexity of the Web, there are still a number of attacks
in this setting. Moreover, now that HTTPS use
has become so common and most traffic is encrypted
(and browsers have banned &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#mixed-content&quot;&gt;mixed content&lt;/a&gt;)
the Internet
and Web threat models have basically merged.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In order for the Web to work successfully, people have to feel
comfortable visiting arbitrary Web pages, even those controlled by the
attacker. It&#39;s the browser&#39;s job to mediate that interaction so that
it&#39;s safe. Back in 2011, my coauthors and I &lt;a href=&quot;https://ptolemy.berkeley.edu/projects/truststc/pubs/840/websocket.pdf&quot;&gt;described this
as&lt;/a&gt;
the &amp;quot;core security guarantee&amp;quot; of the Web: &lt;strong&gt;users can safely visit
arbitrary web sites and execute scripts provided by those sites&lt;/strong&gt;.&lt;/p&gt;
&lt;div&gt;
&lt;p&gt;Just to reinforce this point, in this threat
model &lt;strong&gt;the Web site is the attacker&lt;/strong&gt;. You can come in contact
with a malicious site in several ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;An active attacker on your network can pretend to be a Web site
you are trying to go to. This is less common now with
the rapid increase of encrypted connections in the form
of HTTPS, but it&#39;s still reasonably common for people to
visit a small number of unencrypted sites.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can be lured in some way to a malicious site, for instance
by an ad campaign, phishing, or just visiting the wrong link.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this series, we are not primarily concerned with network
attacks. First, this is supposed to be prevented
at a lower layer, specifically, via HTTPS (modulo phishing).
Second, if you have an insecure connection to your bank,
then the attacker can tamper with your requests to do whatever
they want. Instead, we&#39;re primarily interested
in cases where the attacker gets you to visit their site
and uses that as a foothold to attack your computer or
your interaction with the bank.&lt;/p&gt;
&lt;p&gt;This leads to the following
set of requirements:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;A malicious site won&#39;t be able to compromise your browser or your computer.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A malicious site won&#39;t be able to see or interfere with your interaction with
other sites. For instance, if you have Gmail in one tab
you don&#39;t want an attacker in another tab to be able to read your
emails.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This series of posts is mostly about the second category of attacks.
Making networked programs secure against arbitrary
input is a serious problem, but one that&#39;s not unique to Web
browsers, so we can take it up at a different time.&lt;/p&gt;
&lt;h3 id=&quot;motivation%3A-cookies-and-ambient-authority&quot;&gt;Motivation: Cookies and Ambient Authority &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#motivation%3A-cookies-and-ambient-authority&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One of the problems with writing these posts serially rather than
all at once is that sometimes you find there is something you wish
you had explained earlier that now you can&#39;t go back and do. This is
one of those times. In &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/&quot;&gt;Part II&lt;/a&gt;,
I explained how to use cookies to implement a shopping cart, but
another of the main uses of cookies is to &lt;em&gt;persist&lt;/em&gt;
authentication. This is something you experience every time you
use a Web site that uses authentication: the first time you
go to the site, it detects you aren&#39;t logged in and gives
you a login prompt. On subsequent visits, though, it just remembers
who you are.&lt;/p&gt;
&lt;p&gt;This works in more or less the way you would expect, shown in the
figure below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/authentication-cookies.png&quot; alt=&quot;Authentication with Cookies&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Initially, when the user goes to the site, they have no cookie.
The site notices this and sends them a login page with the
usual username and password prompt. The user enters their
password (presumably in a Web &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#catalog&quot;&gt;form&lt;/a&gt;)
and the browser sends it to the server. The server checks the
password. Assuming the password is correct, the server generates a new cookie,
stores it in the local authentication database along with
the user identifier, and then returns a success page to the
user along with the cookie. The next time the user visits the
site, their browser sends along the cookie. The site can then
look the cookie up in the database and if successful it knows
who the user is and can present an appropriate page. In reality,
this doesn&#39;t happen just on subsequent visits, but during the
same visit. Whenever the user clicks on another link, or even loads
an image off the site, the cookie is used to authenticate them;
the password is just used to authenticate the user long enough to
set the cookie.&lt;/p&gt;
&lt;p&gt;It&#39;s important to realize that from this point on, the cookie is
the only thing authenticating the user to the site. In effect,
the cookie is a new password that&#39;s created by the site and
just handled by the browser rather than remembered by the user.
Anyone who has access to the cookie is effectively the user
(the technical term here is a &lt;em&gt;bearer&lt;/em&gt; token, which
means that anyone who has a copy of the token can impersonate
the user). This means that the cookie has to be (1) unguessable
and (2) be kept secret (this is where encryption comes in, as
we&#39;ll see later).&lt;/p&gt;
&lt;p&gt;Now here&#39;s where things start to get complicated. If you remember
the discussion of online advertising in &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/&quot;&gt;Post III&lt;/a&gt;,
cookies get sent &lt;em&gt;whenever&lt;/em&gt; a resources is loaded, regardless of
the site where the resource is being loaded from. For instance,
suppose that you have a picture on a photo site which is available
only to certain people who are logged into the site. If the
URL isn&#39;t secret, a site can embed an &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag pointing
to the picture and it will be shown on the site. In general,
this applies to any request made by the browser, no matter how
it is triggered. This property
is called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Ambient_authority&amp;amp;oldid=1060661276&quot;&gt;ambient authority&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As I&#39;ve just described it, this sounds really bad: any site
can just load access-controlled material off of any other site,
and would obviously violate the second half of the guarantee
above. And if that were the whole story it would indeed be bad.
What makes this all work is a set of rules called the &lt;strong&gt;same-origin policy&lt;/strong&gt; that
dictate that while a site can &lt;em&gt;load&lt;/em&gt;
the content from another site and show it to the user, it can&#39;t &lt;em&gt;read&lt;/em&gt; the content.
This is a powerful tool, but in practice a very tricky one
to use correctly, as we&#39;ll be exploring in some detail.&lt;/p&gt;
&lt;h3 id=&quot;the-same-origin-policy&quot;&gt;The Same-Origin Policy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#the-same-origin-policy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The same-origin policy (SOP) is the collective name for a large-ish
set of rules about how browsers behave in cross-origin situation.
These rules have gradually evolved over time .
In an important 2006 &lt;a href=&quot;https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.215.6662&amp;amp;rep=rep1&amp;amp;type=pdf&quot;&gt;paper&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; on
Web privacy, this, Jackson, Bortz, Boneh, and Mitchell describe it as follows
(under the name of &amp;quot;same-origin principle&amp;quot;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Only the site that stores some information in the browser may later read or modify that information.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;First, however, we must define what we mean by a &amp;quot;site&amp;quot;. As described
in the previous two posts, any given Web page is often composed of
resources from multiple servers, with each resource being retrieved
via a URL. Obviously, we don&#39;t want all of these resources to
be isolated from each other because we want them to work together
to provide a unified experience. So, we need some concept of &amp;quot;the same site&amp;quot; that is different from just the
URL. This concept is given by the &lt;em&gt;origin&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Recall the structure of the URL from
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1&quot;&gt;post I&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/URL-structure.drawio.svg&quot; alt=&quot;URL Structure&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;risks-of-including-paths-in-the-origin&quot;&gt;Risks of Including Paths in the Origin &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#risks-of-including-paths-in-the-origin&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;One interesting detail is that the path component is not part of the
origin, so &lt;code&gt;https://example.com/abc&lt;/code&gt; and
&lt;code&gt;https://example.com/def&lt;/code&gt; are in the same origin.
There&#39;s an obvious reason for this, which is that Web
sites frequently consist of multiple paths and you
want them to share cookies and state. However, it used
to be fairly common to have several people share a given
server, for instance by having Alice have her home page at
&lt;code&gt;https://example.com/~alice/&lt;/code&gt; and Bob have his
site at &lt;code&gt;https://example.com/~bob/&lt;/code&gt;. Unfortunately,
this has some problematic security properties. For
instance, it&#39;s possible to scope cookies to a given
path prefix, but if Alice sets a cookie, Bob can
read it by injecting script into the page. For more on
this class of problems, see the classic &lt;a href=&quot;http://seclab.stanford.edu/websec/origins/fgo.pdf&quot;&gt;paper&lt;/a&gt;
&amp;quot;Beware of Finer-Grained Origins&amp;quot;
by Adam Barth and Collin Jackson.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The origin of a piece of content retrieved by a URL is defined by
the following three values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;scheme&lt;/em&gt;:  e.g., &lt;code&gt;http:&lt;/code&gt; or &lt;code&gt;https:&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;host&lt;/em&gt;: the domain name of the server&lt;/li&gt;
&lt;li&gt;&lt;em&gt;port&lt;/em&gt;: the TCP or UDP port number that the server is listening on&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In order for two origins to be the same, all three values
must be the same.&lt;/p&gt;
&lt;p&gt;We&#39;ve covered scheme and host before, but what&#39;s a port? Internet
hosts are addressable by IP address, but what if you want to run
multiple services on a given machine, such as mail and Web.  This is
handled by having a second layer of addressing: the &lt;strong&gt;port&lt;/strong&gt;, which is
just a 16-bit number carried in the transport porotocol.  You can
have a large number of different services on a server, each addressed
by a separate port (the technical term here is that you are
&lt;em&gt;multiplexing&lt;/em&gt; multiple services on the same IP and the
port is used to &lt;em&gt;demultiplex&lt;/em&gt; them). Traditionally, each protocol has a fixed port
number (HTTP is 80, HTTPS is 443, e-mail transmission (SMTP) is
25). However, nothing stops you from running services on other ports;
you just need some way to tell the other side what port to talk to.
In URLs, this is done by appending a colon and the port number.&lt;/p&gt;
&lt;p&gt;Here are some examples of URLs and their associated origins:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;URL&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Scheme&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Host&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Port&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;http://example.com&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;http&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;example.com&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;80&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;http://example.com:8080&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;http&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;example.com&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;8080&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;https://example.com&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;https&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;example.com&lt;/code&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;code&gt;443&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Notice that in the first and last examples, the port isn&#39;t provided: HTTP
has a default port value of 80 and HTTPS has a default port value of
443. In the second example, the port (8080) is explicitly provided.
As a practical matter, nearly all Web traffic runs on the default
port, though it&#39;s common to use other ports for development purposes.&lt;/p&gt;
&lt;p&gt;It&#39;s important to note that the path is &lt;em&gt;not&lt;/em&gt; part of the origin.
So, for instance, these URLs have the same origin (See &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy&quot;&gt;MDN&lt;/a&gt; for some more examples, as well as examples of some edge cases.)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://example.com/index.html&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://example.com/~ekr/homepage.html&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://example.com/js/scripts.js&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As I mentioned above, this allows them to work together to provide
a unified experience (though see below for some special considerations
for JavaScript).&lt;/p&gt;
&lt;p&gt;In general, if two resources have the same origin, then they can
share information. However, if A and B are from different origins,
then their interactions are going to be fairly limited.&lt;/p&gt;
&lt;h3 id=&quot;reading%2Fwriting-other-resources&quot;&gt;Reading/Writing Other Resources &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#reading%2Fwriting-other-resources&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;First let&#39;s look at the example I used above: a page from origin A
loading an image from origin B. The SOP requires that A be able to see
the content if and only if A has the same origin as B. If A and
B are from different origins then I can only learn if it was
loaded but can&#39;t see the actual content.
The way you read the content of an HTML &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag is
by drawing it on a &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTML/Element/canvas&quot;&gt;Canvas&lt;/a&gt;
element and then reading the data back with &lt;code&gt;getImageData()&lt;/code&gt;. The following
JavaScript snippet does that and then writes the resulting value
below the image:&lt;/p&gt;
&lt;pre class=&quot;language-javascript&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;onloaded&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; canvas &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; document&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;createElement&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;canvas&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getContext&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;2d&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    canvas&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;drawImage&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;el&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; pixelvalue &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; imgdata &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; canvas&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getImageData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;      pixelvalue &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; imgdata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      pixelvalue &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;forbidden&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    el&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;parentElement&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;appendChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;document&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;createTextNode&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;URL=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; el&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;src &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot; pixel=&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; pixelvalue&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;script&gt;
function onloaded(el) {
    let canvas = document.createElement(&quot;canvas&quot;).getContext(&quot;2d&quot;);
    canvas.drawImage(el, 0, 0);
    let pixelvalue = null;
    try {
      let imgdata = canvas.getImageData(0, 0, 1, 1);
      pixelvalue = imgdata.data.slice(0, 4)
    } catch {
      pixelvalue = &quot;forbidden&quot;;
    }
    el.parentElement.appendChild(document.createTextNode(&quot;URL=&quot; + el.src + &quot; pixel=&quot; + pixelvalue));
}
&lt;/script&gt;
&lt;p&gt;By setting the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/GlobalEventHandlers/onload&quot;&gt;onload&lt;/a&gt;
property on the image element, we can arrange that this function runs whenever the image
is loaded. Below you can see the results with two images, the first loaded from
this site, and the second loaded cross-site.&lt;/p&gt;
&lt;div&gt;
&lt;img style=&quot;width: 100px&quot; src=&quot;https://educatedguesswork.org/img/ekr.jpg&quot; onload=&quot;onloaded(this)&quot; /&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;div&gt;
&lt;img style=&quot;width: 100px&quot; src=&quot;https://www.rtfm.com/ekr-ud.jpg&quot; onload=&quot;onloaded(this)&quot; /&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;p&gt;As you can see, in both cases you can tell when the image was loaded (because the
function gets called) and get some basic
information like the URL (and the width and height). However, when we try
to actually access the image data, call to &lt;code&gt;getImageData()&lt;/code&gt; only works
with the same site image, producing the pixel value
&lt;code&gt;[211, 196, 173, 255]&lt;/code&gt;,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
but fails with the cross-site image, producing
the result &amp;quot;forbidden&amp;quot;. This is
the same-origin policy at work. The same thing applies to other elements
that you load cross-origin like this, for instance audio files or videos.
It &lt;em&gt;also&lt;/em&gt; applies if you load another Web page in an IFRAME or in another
tab. If the page is same-origin, then you can access the DOM of that
page, but if it&#39;s cross-origin you cannot. In addition, same-origin
IFRAMEs or pages can access the original page.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Note, however, that the containing site &lt;em&gt;can&lt;/em&gt; write to a cross-site
element, or rather, it can replace them with other elements. This
makes sense, because even though the site can&#39;t read the element it
ultimately controls the DOM that the element appears in, so it
can just replace it with something else, as in the following
code snippet, which just swaps the image element below between two
images whenever you click:&lt;/p&gt;
&lt;pre class=&quot;language-javascript&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; onclickimageindex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; images &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string&quot;&gt;&quot;/img/ekr.jpg&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string&quot;&gt;&quot;https://www.rtfm.com/ekr-ud.jpg&quot;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;imageonclick&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    onclickimageindex&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    el&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;src &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; images&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;onclickimageindex&lt;span class=&quot;token operator&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;    &lt;/code&gt;&lt;/pre&gt;
&lt;script&gt;
var onclickimageindex = 0;
const images = [
    &quot;/img/ekr.jpg&quot;,
    &quot;https://www.rtfm.com/ekr-ud.jpg&quot;
];
function imageonclick(el) {
    onclickimageindex++;
    el.src = images[onclickimageindex%2];
}    
&lt;/script&gt;
&lt;img style=&quot;width: 100px&quot; id=&quot;switcher&quot; src=&quot;https://educatedguesswork.org/img/ekr.jpg&quot; onClick=&quot;imageonclick(this)&quot; /&gt;
&lt;h3 id=&quot;what-about-javascript%3F&quot;&gt;What About JavaScript? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#what-about-javascript%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;But if cross-origin resources can&#39;t access the DOM, then how is it
that you can load JavaScript libraries off of other sites, which, as I
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1#cross-site-content&quot;&gt;mentioned&lt;/a&gt;,
people do all the time? The answer is that when you load JavaScript
into a site with a &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag, that JavaScript runs in the
origin it was loaded &lt;em&gt;by&lt;/em&gt; not the origin it was loaded &lt;em&gt;from&lt;/em&gt;.  For
instance, if a page loaded from &lt;code&gt;https://educatedguesswork.org&lt;/code&gt;
pulls in a script from &lt;code&gt;https://example.com&lt;/code&gt; that script has the
same privileges as if it were loaded from
&lt;code&gt;https://educatedguesswork.org/&lt;/code&gt; and can do anything one of those
scripts can do.&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that an attacker who can run script in a
site&#39;s security context effectively controls that site from the user&#39;s
perspective. Because scripts can manipulate the DOM, they can make the user
see anything they want. They can access locally stored state
and can often access cookies (via the &lt;code&gt;document.cookie&lt;/code&gt; variable.).
They can&#39;t directly access the user&#39;s password, but they can prompt
the user to retype it and the user will likely do so; a password
manager cannot protect you here because they determine what password
to show based on the site&#39;s origin. Being able to run script on a
site is very nearly as good as intercepting all communications between
the client and the site.&lt;/p&gt;
&lt;h4 id=&quot;mixed-content&quot;&gt;Mixed Content &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#mixed-content&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Because imported JavaScript is so powerful, it&#39;s critical to ensure
that the right script is loaded: an attack on imported JavaScript
is nearly the same as an attack on your site.
Suppose that ExampleCo serves &lt;code&gt;example.com&lt;/code&gt; over HTTPS, but
that site imports JavaScript from &lt;code&gt;http://libraries.example&lt;/code&gt;.
This situation is called &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/Security/Mixed_content&quot;&gt;mixed content&lt;/a&gt; (because you are mixing secure and insecure content).
In this case, even though a network attacker cannot directly attack
&lt;code&gt;example.com&lt;/code&gt;, they can attack the JavaScript from &lt;code&gt;http://libraries.com&lt;/code&gt;
and through that JavaScript control how the browser renders &lt;code&gt;example.com&lt;/code&gt;.
In other words, this is barely better than having the original
site served insecurely.&lt;/p&gt;
&lt;p&gt;Mixed content used to happen quite frequently: if you wanted to
upgrade your insecure site to HTTPS, you might find that some of your
dependencies were insecure; the easiest thing to do was just accept
the situation.  Eventually, as HTTPS became more common, browsers started blocking active
mixed content (like JavaScript), loading the original page but just
generating a network error when it tried to load the insecure content.
This obviously broke some sites which still depended mixed content,
but also protected users from attack on those sites (and in some
cases, the site would still work correctly).&lt;/p&gt;
&lt;h4 id=&quot;compromised-dependencies-and-subresource-integrity&quot;&gt;Compromised Dependencies and Subresource Integrity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#compromised-dependencies-and-subresource-integrity&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Another form of attack on cross-origin JavaScript—or really
any included JavaScript—is attack on or by the site hosting
the script. Suppose that your site depends on a JavaScript
library like &lt;a href=&quot;https://jquery.com/&quot;&gt;jQuery&lt;/a&gt; but loads it
off the jQuery &lt;a href=&quot;https://jquery.com/download/#jquery-39-s-cdn-provided-by-stackpath&quot;&gt;CDN&lt;/a&gt;
rather than hosting it locally. If the jQuery CDN—or the jQuery
distribution itself—is compromised, then the attacker can
serve malicious JavaScript and subvert the user&#39;s experience of
the site. This works even if the connection to the CDN is encrypted,
because the problem is a compromised endpoint, not a network
attacker.&lt;/p&gt;
&lt;p&gt;The W3C has standardized a technology called
&lt;a href=&quot;https://www.w3.org/TR/SRI/&quot;&gt;Subresource integrity (SRI)&lt;/a&gt; which is intended
to prevent this type of attack. The idea behind SRI is that
the &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag loading a piece of JavaScript includes
a cryptographic hash of the expected result. When the browser
loads the resource, it checks the hash and generates an error if
it doesn&#39;t match. For instance, here is a lightly modified
example from the SRI spec:&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;script&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;https://example.com/example-framework.js&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token attr-name&quot;&gt;integrity&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;sha384-Li9vy3DqF8tnTXuiaAJuML3ky+er10rcgNR/VqsVpcw+ThHmYcwiB1pbOxEbzJr7&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token script&quot;&gt;&lt;span class=&quot;token language-javascript&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;script&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In theory, SRI solves the problem of compromised subresources,
but in practice deployment has been &lt;a href=&quot;https://chromestatus.com/metrics/feature/popularity#SRIElementWithMatchingIntegrityAttribute&quot;&gt;fairly slow&lt;/a&gt;.
One likely reason for this is that coordination is difficult: the
site author must somehow learn the hash of the JavaScript library
they are loading, and it&#39;s just one more thing to go wrong.
At present most sites (this site included) which depend on external JavaScript—which
is a huge fraction of the Web because of advertising and tools
like Google analytics—are just dependent on the security
of the external servers which host those scripts.&lt;/p&gt;
&lt;h3 id=&quot;cross-origin-requests-(and-cross-site-request-forgery)&quot;&gt;Cross-Origin Requests (and Cross-Site Request Forgery) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#cross-origin-requests-(and-cross-site-request-forgery)&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As noted above, the SOP allows site A to make requests to site B but
not read the responses. Unfortunately, this still allows for attacks.
The basic problem here is the combination of cross-site requests under
control of the attacker with ambient authority provided by cookies.
Suppose that there is a shopping Website such as the one we described
in &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2&quot;&gt;part II&lt;/a&gt;. If the attacker knows
that you have logged into the site and can get you to visit their
site, they can force you to make purchases on the shopping site,
as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/csrf.png&quot; alt=&quot;CSRF Example&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The way this works is that when you visit the attacker&#39;s site, they
serve you an HTML page with an element that causes the browser
to make a request to the shopping site&#39;s server to buy something;
that request is the same message that the browser would have
sent if you were on the shopping site&#39;s page and comes along
with the user&#39;s cookie (ambient authority, remember?). This all
looks fine and the site just goes ahead and executes the purchase.
This is called a *&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Cross-site_request_forgery&amp;amp;oldid=1078022726&quot;&gt;Cross-Site Request Forgery (CSRF)&lt;/a&gt; attack.&lt;/p&gt;
&lt;p&gt;It&#39;s worth mentioning a few fine points. First, why am I using
an HTML form here? The reason is that many (most?) sites use
the HTTP &lt;code&gt;POST&lt;/code&gt; method for requests that are supposed to
have side effects, such as buying something.
&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Most of the HTML elements that result in a cross-origin
load use the &lt;code&gt;GET&lt;/code&gt; method, but forms allow you to use
&lt;code&gt;POST&lt;/code&gt;. You can also use JavaScript methods to make this
kind of cross-origin request, but the situation is somewhat
more complicated, so I&#39;m going to get to it later when
I talk about &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS&quot;&gt;Cross-Origin Resource Sharing (CORS)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Second, it&#39;s possible to make this operation automatic and
invisible to the client: even though form submission usually
results in navigation events, you can put the form in a hidden
IFRAME so the user doesn&#39;t notice the event. Similarly, you can
use JavaScript to trigger the form submission so that it happens
automatically on loading the page.&lt;/p&gt;
&lt;p&gt;Obviously, CSRF is a serious attack, and we&#39;d all be in trouble
if it were regularly possible to mount CSRF attacks on (say) Amazon
or (worse) Wells Fargo. The most basic CSRF defense is to use
what&#39;s called a &lt;em&gt;CSRF token&lt;/em&gt;. The idea is that when you access
the legitimate site, it adds a random token to every HTML
element corresponding to a request which would generate side
effects. For instance, if it gives you a link to add something
to your shopping cart, that link might have a random token
at the end. Then, when your browser dereferences the link to
add the item, it sends along the token; the site checks it and
only takes the action if the token is correct. Because the
CSRF request the attacker induces doesn&#39;t have the token, it
will be rejected.&lt;/p&gt;
&lt;p&gt;It&#39;s worth taking a moment to think about how this defense works:
effectively, it&#39;s a check on ambient authority. ordinarily, requests are authenticated
just by having the cookie but because of CSRF that&#39;s not good
enough; the token restores the concept of the provenance of the
request. In order for it to work properly, the token has to be (1) unknown
to the attacker and (2) tied to the user (presumably via the cookie).
If it&#39;s not tied to the user, the attacker will just go to the
site themselves, retrieve the token, and give it to the user&#39;s
browser on their page.&lt;/p&gt;
&lt;p&gt;One very important property of CSRF tokens is that they work
with every browser because they don&#39;t depend on any new browser
feature. Over the years a number of such features have been
introduced to make CSRF harder, but any new feature takes time
to propagate throughout the entire user population. This is a general
problem with Web security. When a new
attack like CSRF is discovered, sites need to be able to protect
themselves immediately and so defenses which don&#39;t require client
side changes are strongly preferred and can&#39;t be relaxed until
effectively the entire user population has upgraded to the
new client-side defenses.&lt;/p&gt;
&lt;p&gt;There is some good news on this front, however. As I noted above, this is a consequence of the fact that
cookies are sent &lt;em&gt;both&lt;/em&gt; in the situation where the resource
is on the same site and where the resource is on a different
site. Arguably this is a misfeature in HTTP, and so one fix is to
simply have cookies only apply to same site resources.
This is the idea behind &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie/SameSite#lax&quot;&gt;SameSite cookies&lt;/a&gt;.
When you set a cookie, you can add the &lt;code&gt;SameSite&lt;/code&gt; label with a cookie
to say whether it can or cannot be used for cross-site resources.
Recently, browsers have started to default cookies to &lt;code&gt;SameSite=Lax&lt;/code&gt;,
which is intended to prevent cookies being used in contexts which would
enable CSRF. Once those browsers become ubiquitous,
sites should finally be able to deprecate CSRF tokens.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-cross-origin-resource-sharing&quot;&gt;Next Up: Cross-Origin Resource Sharing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#next-up%3A-cross-origin-resource-sharing&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The same-origin policy is a fairly blunt—albeit
complicated—instrument. There are times when you would like to
do cross-origin requests that also carry authentication and
actually be able to see the data. In the next post, I&#39;ll be talking
about a mechanism designed to allow that: Cross-Origin Resource Sharing (CORS).&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This paper is actually quite entertaining reading, as
it describes many tracking techniques we see in use today, such
as &lt;a href=&quot;https://blog.mozilla.org/security/2020/08/04/firefox-79-includes-protections-against-redirect-tracking/&quot;&gt;bounce tracking&lt;/a&gt;.
In addition, Section 1 starts with &amp;quot;The web is a never-ending source of security and privacy problems. It is an inherently untrustworthy place, and yet users not only expect to be able to browse it free from harm, they expect it to be fast, good-looking, and interactive — driving content producers to demand feature after feature, and often requiring that new long-term state be stored inside the browser client&amp;quot; &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The four values are R, G, B, and alpha. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: in order for this to work, you need the two
pages to have a handle to each other. This happens if page
A was opened by page B with &lt;code&gt;window.open()&lt;/code&gt; or if
page B is an IFRAME on page A. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The HTTP &lt;a href=&quot;https://httpwg.org/specs/rfc7231.html&quot;&gt;spec&lt;/a&gt;
spec strongly discourages using GET in contexts that
have this kind of user-visible side effect
&amp;quot;Request methods are considered &amp;quot;safe&amp;quot; if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource. Likewise, reasonable use of a safe method is not expected to cause any harm, loss of property, or unusual burden on the origin server.&amp;quot;
 &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-origin/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/div&gt;</content>
	</entry>
	
	<entry>
		<title>Understanding The Web Security Model (Outtake): Cookies and Behavioral Advertising</title>
		<link href="https://educatedguesswork.org/posts/web-security-intro-advertising/"/>
		<updated>2022-03-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web-security-intro-advertising/</id>
		<content type="html">&lt;p&gt;This post was originally part of &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/&quot;&gt;Post
II&lt;/a&gt; of
my &lt;a href=&quot;https://educatedguesswork.org/tags/web%20security/&quot;&gt;series&lt;/a&gt; on the
Web Security Model but kind of broke up the flow of that post, so
it got pulled out. But a blog means never having to
&lt;a href=&quot;https://www.masterclass.com/articles/what-does-it-mean-to-kill-your-darlings&quot;&gt;kill your darlings&lt;/a&gt;, so here it is.
In Post II I wrote about how Web applications use cookies for
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#shopping-carts&quot;&gt;statekeeping&lt;/a&gt; on a single site, but it
turns out to be trivial to extend that functionality to provide
targeting for behavioral advertising. There&#39;s nothing new technically
here, it&#39;s just a new combination of several existing elements we&#39;ve
already seen.&lt;/p&gt;
&lt;h2 id=&quot;ad-networks&quot;&gt;Ad Networks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/#ad-networks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Most advertising on the Web is done by &lt;em&gt;ad networks&lt;/em&gt;. It&#39;s
of course technically possible to just sell ads on your
own site, but for obvious reasons this doesn&#39;t really work
unless you&#39;re a big prestige site like Google, Facebook, or
the New York Times. Instead, the typical thing to do is
for the publisher to work with some third party ad provider
who places ads on a lot of different sites.&lt;/p&gt;
&lt;p&gt;The technical details of the system are unbelievably
complicated. It&#39;s traditional at this point to show
the baffling diagram below, called the &amp;quot;LUMAscape&amp;quot;, which maps
out the various entities in the ad ecosystem. However,
at the level we need to be concerned with, matters are fairly simple.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lumapartners.com/wp-content/uploads/2022/02/2y10amcw7KONhLSbiYqGDX.BO_.HOfept8FgLiCqpPDH2BXQ8zyQ2hV0DK.png&quot; alt=&quot;Lumascape&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In order to show advertising from a given ad network, the publisher
embeds an element on their site with content of the element being loaded off of the ad
network&#39;s server.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
When the user visits the publisher&#39;s
site the browser automatically loads the content from the ad network,
which invisibly decides what ad to show. Recall that there&#39;s no rule
that the content at a given URL has to remain constant, so the
server can dynamically select the specific ad based on any information it has.&lt;/p&gt;
&lt;p&gt;There are a variety of options for the element type.  The simplest
thing to do is just to use an image or an or an IFRAME. A fancier
alternative is to first load some JavaScript off the ad network site;
that JavaScript can then insert an image or IFRAME into the DOM of the
page. Whatever the method, the browser ends up loading some content
from the ad network. Note that I&#39;m radically oversimplifying here; describing
the ad sales process is out of scope for this post.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;determining-context&quot;&gt;Determining Context &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/#determining-context&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;There are a variety potential ways for the ad network to know the context
of the page. First, browsers add a header called &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer&quot;&gt;Referer&lt;/a&gt; which indicates the original site (yes,
it&#39;s spelled &amp;quot;Referer&amp;quot;. It&#39;s a typo that we&#39;re now
stuck with). Increasingly,
however browsers are sending less useful Referer headers
(for privacy reasons). Another major option is to carry this
data in the URL. In the simplest version, the publisher can
be given a per-publisher URL. If the ad was inserted
by ad network JavaScript, then that can insert the page into
the URL. In any case, the ad network can generally tell what
page the ad was on.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The question then becomes what ad the network should show.
You could obviously show the same ad everywhere, but that&#39;s not
going to do a very good job of showing interesting ads.
The next most interesting thing is to show what&#39;s called
a &amp;quot;contextual&amp;quot; ad, which is to say an ad that is relevant
to the content of the page on which it is being shown.
For instance, if you were on Runner&#39;s World you might
get an ad for running shoes.&lt;/p&gt;
&lt;p&gt;However, a lot (most?) of Web advertising isn&#39;t contextual but rather
&amp;quot;behavioral&amp;quot;. What this means is that it&#39;s not just based on the page
the user is currently is on but based on their previous behavior.
That behavior is measured using cookies.&lt;/p&gt;
&lt;br /&gt;
&lt;h3 id=&quot;behavioral-tracking-with-cookies&quot;&gt;Behavioral Tracking with Cookies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/#behavioral-tracking-with-cookies&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If the advertising network has contracts
with multiple publishers this allows them to observe the user&#39;s
behavior across those publishers. The first time that
the user goes to a page served by a given ad network,
that ad network sets a cookie. From then on, they get to see every site that the user goes
to and can link them all up using the cookie. Based on that
information, they can build up a profile of the user&#39;s behavior
and use that to decide which ads to show (recall that the
server can serve any image it wants, regardless of the URL).
The diagram below shows an example of this process.&lt;/p&gt;
&lt;div class=&quot;img-wrap&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/tracking-cookies.png&quot; alt=&quot;Tracking via cookies&quot; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The user first
visits &lt;code&gt;sneakers.example&lt;/code&gt;, which embeds an image from
the advertiser&#39;s site. The advertiser only knows that the
user is on &lt;code&gt;sneakers.com&lt;/code&gt; but nothing about the user
so it serves a contextual ad for sneakers. However, when
it returns the ad it sends a cookie. Later, the user
visits &lt;code&gt;recycling.example&lt;/code&gt;, which also embeds an image
from the same advertiser. This time, when the user
visits the advertiser, it sends the cookie, so the
advertiser knows that (1) the user was on &lt;code&gt;sneakers.com&lt;/code&gt;
before and (2) they are on &lt;code&gt;recycling.example&lt;/code&gt; now,
so it shows the user an ad suitable for both interests:
&lt;strong&gt;recycled sneakers&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;You can also use this seem basic technique for what&#39;s called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Behavioral_retargeting&amp;amp;oldid=1020847103&quot;&gt;retargeting&lt;/a&gt;.
Suppose you go to a site and look at some product. If the ad network
has a presence on the site (this can be an invisible element)
then they can record this event and use it to target ads
specifically at people interested in that product.&lt;/p&gt;
&lt;h3 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The use of cookies for behavioral advertising
is basically an unintended consequence of the design of
cookies, specifically, allowing them to be used in what&#39;s
often called a &amp;quot;third party&amp;quot; context, in which the site you are sending
the cookie to is different from the site you are on.
One the one hand, this is an example of the power and extensibility
of a few basic primitives: you can build a global ad network
based on not much more than the ability to load third party
content onto a site and attach cookies to those requests.
On the other hand, the result is
a system built on ubiquitous surveillance.&lt;/p&gt;
&lt;p&gt;At the time cookies were first introduced, people &lt;em&gt;did&lt;/em&gt; understand
that there were privacy implications. However, a lot of the attention
focused on first party tracking (i.e., of your behavior on a single
site). The original cookie
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc2109&quot;&gt;RFC&lt;/a&gt; has
a fairly extensive discussion of privacy, but the &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc2109#section-8.3&quot;&gt;section&lt;/a&gt;
that most clearly addresses the third party context is kind
of confusing and seems almost to be discussing what is now
called &lt;a href=&quot;https://freedom-to-tinker.com/2014/08/07/the-hidden-perils-of-cookie-syncing/&quot;&gt;cookie syncing&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A user agent should make every attempt to prevent the sharing of
session information between hosts that are in different domains.
Embedded or inlined objects may cause particularly severe privacy
problems if they can be used to share cookies between disparate
hosts.  For example, a malicious server could embed cookie
information for host &lt;a href=&quot;http://a.com/&quot;&gt;a.com&lt;/a&gt; in a URI for a CGI on host &lt;a href=&quot;http://b.com/&quot;&gt;b.com&lt;/a&gt;.  User
agent implementors are strongly encouraged to prevent this sort of
exchange whenever possible.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My sense is that people were sort of aware of the problem
but just didn&#39;t anticipate the scale of tracking that would
eventually result.
It&#39;s also worth noting that early browsers would often prompt
users before accepting cookies, thus making this kind of tracking
more difficult. Eventually, of course, every site wanted to
set a zillion cookies and the permission prompts got too annoying
so they were removed, only to be replaced years later by the
arguably even more annoying &lt;a href=&quot;https://gdpr.eu/cookies/&quot;&gt;GDPR cookie consent dialogs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is a theme we&#39;ll be seeing throughout this series: a lot
of the early Web features were designed to solve specific problems
and without much of understanding of the broader implications.
It took years for the security and privacy community to catch
up and develop a more comprehensive understanding of the
security of the Web platform, and, as with advertising,
we&#39;re still dealing with the implications of those original choices.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Technically, this third party is called a
&lt;em&gt;supply-side platform (SSP)&lt;/em&gt;.  There are also &lt;em&gt;demand-side platforms (DSP)s&lt;/em&gt;
which serve the advertisers, plus a bunch of other stuff. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-intro-advertising/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding The Web Security Model, Part II: Web Applications</title>
		<link href="https://educatedguesswork.org/posts/web-security-model-intro2/"/>
		<updated>2022-03-08T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web-security-model-intro2/</id>
		<content type="html">&lt;style&gt;
.img-wrap {
  display: inline-block;
}
.img-wrap img {
  width: 80%;
}&lt;/style&gt;
&lt;p&gt;&lt;em&gt;Note: This is one of those posts that is going to be best read on
the Web, especially if you read your email using GMail or the like,
as it will tend to mangle some of the HTML features.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is Part II of my series on the Web security model. In
&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2&quot;&gt;Part I&lt;/a&gt;, I talked about the basic
structure of the Web and how Web publishing works.  However, quite
early in the lifetime of the Web people started to want to do more
than just publish information. In particular, they wanted to &lt;strong&gt;sell
stuff&lt;/strong&gt;.  Of course, you could just publish your catalog on the Web
and then have people email you their order, but this is obviously
pretty clunky; what you want is a Web storefront (yeah, I know this is
obvious now, but we&#39;re talking 1994!).&lt;/p&gt;
&lt;p&gt;It&#39;s possible to build even fancier applications like Facebook or Slack with
not much more than the primitives I introduced in the previous
post; it&#39;s mostly a matter of combining them in the right way.
That&#39;s the topic of this post.&lt;/p&gt;
&lt;h2 id=&quot;how-to-build-a-web-store&quot;&gt;How to build a Web store &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#how-to-build-a-web-store&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said, much of the initial work around Web applications was in
building shopping sites. Your basic shopping site was pretty simple,
with just a few functions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Showing the catalog of items.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Adding selected items to the shopping cart.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Checking out, buying the items in the cart.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&#39;s go through these one at a time.&lt;/p&gt;
&lt;h3 id=&quot;catalog&quot;&gt;Catalog &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#catalog&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you have a relatively small number of items, then you can build
a catalog entirely with technologies we saw in the last post. There
are two main options here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If you have a very small number of items you can just make
a static Web page that shows them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you have a somewhat larger number of items—especially
if they go in or out of stock, or you have different prices
in different regions—then you can dynamically generate
the Web page.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first option is straightforward. The way that the second
option works is that you have some database that is basically
a list of every item (the jargon here is &lt;em&gt;stock keeping unit&lt;/em&gt; (SKU)),
its description, maybe a picture or two, and the price or prices.
Then when the user&#39;s browser requests a given catalog page,
some code on your server goes through the database and
renders it into an HTML page and serves it back to the browser.&lt;/p&gt;
&lt;p&gt;It&#39;s important to realize that these two methods are
interchangeable from the perspective of the browser; the
server can switch between static and dynamically
generated pages at will. It can also &lt;em&gt;cache&lt;/em&gt; the dynamically
generated pages—that is, temporarily store the output
of what was generated—and serve that back to clients,
thus saving run time and computing resources.&lt;/p&gt;
&lt;p&gt;I know I keep making this point, but it really can&#39;t
be overemphasized—as long as
the data sent to the client is valid HTML, the browser doesn&#39;t
care how it was generated. The point of having standardized
network protocols is so that you can detach the implementation
on each side from the messages they send to each other.
This creates important implementation flexibility and allows
new functionality to be added on either end without consulting
the other. Part of what makes the Web so powerful is the
combination of these standardized protocols with the
ability to move implementation logic onto the client
via JavaScript, as we&#39;ll see below.&lt;/p&gt;
&lt;p&gt;This is great if you are a small site, but if your store
is the size of Amazon (or even the &lt;a href=&quot;https://www.lcbo.com/webapp/wcs/stores/servlet/en/lcbo&quot;&gt;LCBO&lt;/a&gt;),
you obviously need people to be able to search. Fortunately,
HTML has a feature that makes this straightforward, the
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTML/Element/form&quot;&gt;&lt;code&gt;&amp;lt;form&amp;gt;&lt;/code&gt; element&lt;/a&gt;.
At a high level, a form element is a container for one or
more &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input&quot;&gt;input controls&lt;/a&gt; (text fields, buttons, pull-down
menus, etc.). The form element also has an &amp;quot;action&amp;quot; which
causes the client to send the values of these elements
to the server.&lt;/p&gt;
&lt;p&gt;For instance, here is the form element that represents
the subscription box at the bottom of this page:&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;form&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;email-form&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;https://educatedguesswork-subscribe.herokuapp.com/subscribe&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;post&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;input&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;subscribe-email&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;email&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;placeholder&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;Your e-mail address...&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;email&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;email&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;input&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;subscribe-button&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;submit&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;Subscribe&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;/&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;form&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ignore the &lt;code&gt;class&lt;/code&gt; attributes; they are just labels that are used to attach
CSS styles to the form. The key things to look at here are the &lt;code&gt;action&lt;/code&gt; tag on
the first line. What this says is that when you &amp;quot;submit&amp;quot; the form the browser
will navigate to &lt;code&gt;https://educatedguesswork-subscribe.herokuapp.com/subscribe&lt;/code&gt;.
The first input field &lt;code&gt;type=email&lt;/code&gt; creates a text field that you can put
your email address into. You submit by clicking on the &amp;quot;Subscribe&amp;quot; button which is generated
by the second &lt;code&gt;input&lt;/code&gt; field, of type &lt;code&gt;submit&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This produces the following result, which you can actually use to
subscribe to my newsletter. Take a minute to do it now.&lt;/p&gt;
&lt;div style=&quot;margin-bottom: 10px;&quot;&gt;
&lt;form class=&quot;email-form&quot; action=&quot;https://educatedguesswork-subscribe.herokuapp.com/subscribe&quot; method=&quot;post&quot;&gt;
  &lt;input class=&quot;subscribe-email&quot; type=&quot;email&quot; placeholder=&quot;Your e-mail address...&quot; id=&quot;email&quot; name=&quot;email&quot; /&gt;
  &lt;input class=&quot;subscribe-button&quot; type=&quot;submit&quot; value=&quot;Subscribe&quot; /&gt;
&lt;/form&gt;
&lt;/div&gt;
&lt;p&gt;All done? Great.&lt;/p&gt;
&lt;p&gt;When you fill in the form and click submit, the client sends the server an
HTTP request that looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;POST /subscribe HTTP/1.1
Host: educatedguesswork-subscribe.herokuapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:99.0) Gecko/20100101 Firefox/99.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
[other headers deleted]

email=ekr%40rtfm.com
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To orient yourself, the first line is called the &amp;quot;request line&amp;quot;, the next lines are
called &amp;quot;headers&amp;quot;, and the stuff after the blank line is called the &amp;quot;body&amp;quot;.
The &lt;code&gt;Host&lt;/code&gt; header and the second field of the first line (&lt;code&gt;/subscribe&lt;/code&gt;)
together match the URL in the &lt;code&gt;action&lt;/code&gt; attribute of the form element
defined above. The body of the submission contains the value of the form,
in this case the &lt;code&gt;email&lt;/code&gt; field and the value of &lt;code&gt;ekr@rtfm.com&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Even though this comes from a form submission, it&#39;s conceptually like a link
click, and the result is that the browser is navigating to a new page.
Therefore, the server is expected to respond with a new HTML page.
As noted above, it can generate this page however it wants, but the
idea is that it will do some processing on the form submission input,
in this case subscribing you to the list. The response is just an
HTML page indicating (hopefully) success.&lt;/p&gt;
&lt;p&gt;It should be obvious at this point how to use an HTML form to build
a search interface: you use almost exactly the same HTML as above, except
with different text labels and probably input type &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/search&quot;&gt;&lt;code&gt;search&lt;/code&gt;&lt;/a&gt;
rather than &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/email&quot;&gt;&lt;code&gt;email&lt;/code&gt;&lt;/a&gt;.
The user would type the product search term in the box and click submit;
the server would respond with the products that match the search
term. That&#39;s all there is to it.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;statelessness&quot;&gt;Statelessness &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#statelessness&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the early days of the Web, there was a lot of emphasis
on how HTTP was &lt;em&gt;stateless&lt;/em&gt;, which is to say that each
request by the client was independent of every other client
and that the protocol had no way of linking them up.
This property extended down to the network layer:
each request was carried over a new &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transmission_Control_Protocol&amp;amp;oldid=1074414854&quot;&gt;TCP&lt;/a&gt;
connection, with the connection being closed after
the server sent the response (in fact, closure of
the connection was often used to indicate the end of
the response).&lt;/p&gt;
&lt;p&gt;Statelessness turns out to be a fairly inconvenient property for
several reasons. The first is the one we are seeing here,
which is that lots of things the server wants to do require
creating continuity between client requests and so it
was necessary to retrofit a state-keeping mechanism.&lt;/p&gt;
&lt;p&gt;The second reason is performance: because of the way that
network protocols are designed, there is a significant amount
of startup overhead each time a connection is created
(see &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=TCP_congestion_control&amp;amp;id=1073805339&amp;amp;wpFormIdentifier=titleform#Slow_start&quot;&gt;slow start&lt;/a&gt;, so having a new connection for each request
add significant delays. Much of the history of the development
of HTTP is concerned with removing the legacy of this
initial decision, first by adding multiple requests on
the same connection and then by adding multiplexing of
multiple simultaneous requests.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;shopping-carts&quot;&gt;Shopping Carts &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#shopping-carts&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Our next job is to let the user select some products and add them to
their shopping cart. Unfortunately, this presents us with a problem,
which is remembering which items the user has selected.
The problem is that the HTTP requests to the server don&#39;t contain any
kind of user identifier, so when your browser sends a request
asking to add an item to your shopping cart, how does the server
know whether to add it to your cart or to my cart?&lt;/p&gt;
&lt;p&gt;The solution to this problem that eventually emerged is what&#39;s called
a &amp;quot;cookie&amp;quot;. The idea behind a cookie is simple: the server sends the
client a cookie in the header of one HTTP response and the client stores
it. The client then sends the cookie to the server in subsequent requests.
The cookie is just an opaque string to the client and the server can construct
it any way it pleases, but there are two main options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;An opaque identifier for this user or session. This identifier is then
used as an index into some database that stores the user&#39;s state.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An actual representation of the user&#39;s state (e.g., a list of items
in its cart).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because the cookie is opaque, the server is, of course, free to use
either of these techniques or a combination of the two.&lt;/p&gt;
&lt;p&gt;The diagram below shows an example of how cookies can be used to build a shopping
cart:&lt;/p&gt;
&lt;div class=&quot;img-wrap&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/shopping-cart.png&quot; alt=&quot;Shopping cart example&quot; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In this case, the server has chosen to use a back-end database, so the
cookie is just an opaque identifier (&lt;code&gt;XYZ&lt;/code&gt;). Initially, the client contacts the
server and requests the catalog. The client and the server have never talked
before so the client doesn&#39;t have a cookie. The server creates a new
cookie with value &lt;code&gt;XYZ&lt;/code&gt; and stores an empty shopping cart &lt;code&gt;[]&lt;/code&gt;
in the database associated with that cookie.
It then returns the catalog to the
client along with the cookie.&lt;/p&gt;
&lt;p&gt;The user browses through the catalog and selects item &lt;code&gt;1234&lt;/code&gt;. When
they click to add it to the shopping cart, the browser sends a request
to the server with the item id and the cookie. The server then uses
the cookie to retrieve the shopping cart. Seeing it&#39;s empty, it adds
the item to the cart and stores that in the database. Finally, it
returns a confirmation to the user. The user browses the catalog some more and decides to buy item &lt;code&gt;5678&lt;/code&gt;.
This transaction proceeds the same way, except that this time
the server adds it to the already non-empty shopping cart, ending up
with two items.&lt;/p&gt;
&lt;h3 id=&quot;checkout&quot;&gt;Checkout &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#checkout&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At this point, we have all the tools we need to do checkout. When
the user presses the checkout button, the server uses the cookie
to collect all the items in the shopping cart and compute the final
price. It then provides a Web form which lets the user enter
their name, address, payment information, etc. The user submits
that form (with the cookie, of course), and the server processes
the transaction. It then can clear the shopping cart (so that the
user can start shopping again) and send back the confirmation
page.&lt;/p&gt;
&lt;h2 id=&quot;client-side-applications&quot;&gt;Client-Side Applications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#client-side-applications&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In principle you can build just about any application you want with
the techniques described &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/how-to-build-a-web-store&quot;&gt;above&lt;/a&gt;.
In practice, though, loading a new page whenever you want to
change anything is painfully slow.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
It&#39;s certainly too slow to give a smooth app-like experience.
Moreover, it&#39;s ugly because the page flashes as it rerenders
and so it&#39;s anything but smooth. The resulting system isn&#39;t
really viable for anything significantly interactive like
Google Maps, Slack, etc.&lt;/p&gt;
&lt;p&gt;Fortunately, we already have the solution: JavaScript. Recall
that in &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1#the-dom&quot;&gt;Part I&lt;/a&gt;
I said that JavaScript could change the DOM and that this would
cause the page to change as well. The key thing is that unlike
a page reload, small changes to the DOM mostly don&#39;t cause the
entire page to rerender (only the elements that need to be updated).&lt;/p&gt;
&lt;p&gt;Here&#39;s a simple example of what I&#39;m talking about. The box below
is a list of entries. If you enter a new entry in the box at
the bottom and hit return, it will be added to the list
without the page reloading.&lt;/p&gt;
&lt;div style=&quot;border-style: solid; display: inline-block; margin-bottom: 10px;&quot;&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;th&gt;Shopping List&lt;/th&gt;
  &lt;/thead&gt;
  &lt;tbody id=&quot;entries-list&quot;&gt;
    &lt;tr&gt;&lt;td&gt;Apples&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Bananas&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;form id=&quot;list-addition-form&quot;&gt;
  &lt;input id=&quot;list-addition-entry&quot; type=&quot;text&quot; placeholder=&quot;Add a new list entry&quot; /&gt;
&lt;/form&gt;
&lt;/div&gt;
&lt;p&gt;The way this works is just that I have a tiny piece of JavaScript that
watches for you to hit return in the entry box and adds the value of
the box into the list:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; tbodyEl &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; document&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getElementById&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;entries-list&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; textboxEl &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; document&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getElementById&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;list-addition-entry&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; formEl &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; document&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getElementById&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;list-addition-form&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;formEl&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;addEventListener&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;submit&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    event&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;preventDefault&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; row &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; tbodyEl&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;insertRow&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; cell &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; row&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;insertCell&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    cell&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;appendChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;document&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;createTextNode&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;textboxEl&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    textboxEl&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;value &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;script&gt;
const tbodyEl = document.getElementById(&quot;entries-list&quot;);
const textboxEl = document.getElementById(&quot;list-addition-entry&quot;);
const formEl = document.getElementById(&quot;list-addition-form&quot;);

formEl.addEventListener(&quot;submit&quot;, function(event) {
    event.preventDefault();
    const row = tbodyEl.insertRow(-1);
    const cell = row.insertCell(0);
    cell.appendChild(document.createTextNode(textboxEl.value));
    textboxEl.value = &quot;&quot;;
});
&lt;/script&gt;
&lt;p&gt;We don&#39;t need to go through this in detail, but at a high level, the
first three lines select the relevant elements (the table, the textbox, and the form),
and the rest of the code is a JavaScript function that retrieves the
value from the textbox and adds it to the list. Attaching it
to the &lt;code&gt;&amp;quot;submit&amp;quot;&lt;/code&gt; event ensures it will run whenever the form is submitted,
which is when you press return.
Obviously this is a trivial example, but trivial examples are the stepping
stones to real programs. Suppose we wanted to make something like Slack.
The most basic version really only needs two small changes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;When you type into the window, it needs to send a message to the
other people in the chat.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When someone sends you a message, it needs to receive it and
add it to the list of messages.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These are both done with the same basic technique: a Web Service API.&lt;/p&gt;
&lt;h3 id=&quot;web-service-apis&quot;&gt;Web Service APIs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#web-service-apis&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;So far, all the examples of requests made to Web servers are for content
which will then be consumed by the browser (e.g., HTML, JavaScript, etc.)
A Web service API is different: it serves data that is intended to be consumed
by JavaScript running in the browser.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
For instance, in our chat application, the server would have (minimally)
two functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Send&lt;/strong&gt; a message to a channel.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Receive&lt;/strong&gt; any new messages on a given channel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each function requires defining a few things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The URL (path) for the API function.
It&#39;s conventional to refer to URL, and by extension the function, as an &amp;quot;API endpoint&amp;quot;.&lt;/li&gt;
&lt;li&gt;A definition for the data that the client sends to the server
(both format and semantics)&lt;/li&gt;
&lt;li&gt;A definition for the data that the server sends to the client&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For instance, here&#39;s the API that Slack uses to &lt;a href=&quot;https://api.slack.com/methods/chat.postMessage&quot;&gt;post a message&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;the-client-side&quot;&gt;The Client Side &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#the-client-side&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;On the client side, the JavaScript uses the
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API&quot;&gt;&lt;code&gt;fetch&lt;/code&gt;&lt;/a&gt; API or the
older &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest&quot;&gt;&lt;code&gt;XmlHttpRequest (XHR)&lt;/code&gt;&lt;/a&gt;
API to talk to the server. These Web APIs let it make arbitrary (within some limits I&#39;ll cover later)
HTTP requests to the server, which means that they can use the endpoints provided by the server.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
To continue our chat example above, whenever the user types a message
into the compose window and hit enter, the JavaScript function that
gets activated would use &lt;code&gt;fetch&lt;/code&gt; to tell the server
that a new message had been added to the chat. This might look
something like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;POST /send-message HTTP/1.1
Host: chat-server.example.com

message=Hello World!
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Obviously, this could be fancier and include a channel identifier,
or, if it were a direct message, the recipient identifier, but you
get the idea. Depending on the way the application was written, that
same function might add the message to the local window or the server
might handle this with the same code it uses for incoming messages
(see below).&lt;/p&gt;
&lt;p&gt;This brings us to incoming messages. The simplest way for this to
work is for the server to have an endpoint that allows the client to
ask for new messages. For instance, it might look something like
this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;GET /get-message?lastmessage=105 HTTP/1.1
Host: chat-server.example.com

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The semantics of this request would be something like &amp;quot;Send me a copy
of every message with a sequence number greater than 105&amp;quot;. That way,
the client can just ask for new messages without the server having
to remember which ones the client already knows. And a new client
can get all the messages by sending &lt;code&gt;lastmessage=0&lt;/code&gt; (or maybe &lt;code&gt;-1&lt;/code&gt;,
if you started counting from &lt;code&gt;0&lt;/code&gt;). The server would then respond
with a list of new messages, which would be empty if there were
no new messages. Once those messages are received, the client
side JavaScript can just add them to the message window.&lt;/p&gt;
&lt;p&gt;This style of application was originally known as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Ajax_(programming)&amp;amp;oldid=1066817022&quot;&gt;Asynchronous JavaScript and XML (AJAX)&lt;/a&gt;). Asynchronous because
you could be using the Web application while it talked to the server.
JavaScript for obvious reasons. XML because at the time most servers
used &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=XML&amp;amp;oldid=1074677563&quot;&gt;XML&lt;/a&gt; to send
messages around (XML is just a structured data format). In recent years, however,
fashions have changed and increasingly people structure
their data in &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=JSON&amp;amp;oldid=1073541068&quot;&gt;JavaScript Object Notation (JSON)&lt;/a&gt;
instead.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
&amp;quot;AJAJ&amp;quot; just doesn&#39;t have the same ring to it, though.
Whatever the name, this is now the dominant style of Web application,
for sites as diverse as Google Maps, Facebook, Slack, and Kayak.
You still see old-style Web applications, but if you want to do
something fancy—which people often do—then it&#39;s likely to have some sort of AJAX-y component.&lt;/p&gt;
&lt;p&gt;Just to keep emphasizing this point: the only new piece of technology
here is the existence of the client-side HTTP APIs. Everything else
is just done server-side by adding new server-side endpoints and writing new JavaScript
which the server sends to the client.&lt;/p&gt;
&lt;h3 id=&quot;notifications&quot;&gt;Notifications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#notifications&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;With that said, there is one kind of inconvenient property of this system:
We&#39;ve just shown how the client can find out what messages are available,
but how does the client know when to ask? The obvious approach is to
just &lt;em&gt;poll&lt;/em&gt; the server constantly, but then you&#39;re adding a lot of load
to the server as well as a lot of network traffic. You can also poll
less frequently, like every 10 seconds or so;  but while this might be fine for e-mail, it&#39;s really not
fast enough for instant messaging, because it means that on
average each message will be delayed by 5 seconds.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;paving-the-cowpaths&quot;&gt;Paving the Cowpaths &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#paving-the-cowpaths&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The story of long polling and WebSockets is a common pattern on
the Web. The Web is now powerful enough that you can usually
get the job done, though perhaps in a hacky and inefficient
way. But people have product requirements so they do it anyway.
Once some technique gets common enough, then it becomes
attractive to build a better version into the platform
(&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Desire_path&amp;amp;oldid=1070325803&quot;&gt;&amp;quot;paving the cowpaths&amp;quot;&lt;/a&gt;)
but application developers don&#39;t need to wait for that to
happen. Moreover, there is usually a long period where
only some browsers support the new technology, so application
developers will check to see if it&#39;s available on a given
browser and if so use it, and otherwise fall back to the old
hack.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The fundamental problem is that HTTP requests are initiated by the client
and there&#39;s no way for the server to talk back without the client
saying something first. And then someone clever realized that instead
of having the server respond immediately when there were no new
messages, it could instead wait to respond until there &lt;em&gt;were&lt;/em&gt; new
messages. This is called a &amp;quot;long poll&amp;quot; and lets the client gets the
information right away, without constantly polling the server.&lt;/p&gt;
&lt;p&gt;Long polling works, but it&#39;s not ideal. Due to various timeouts at
different parts of the system, you can&#39;t have an HTTP request
outstanding indefinitely, so as a practical matter the request
times out after some tens of seconds and then you have to reissue
it. Also, it&#39;s just kind of a hack. Back in 2011 the IETF standardized
a protocol called &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6455&quot;&gt;WebSocket&lt;/a&gt;
that provided a bidirectional channel over top of HTTP to replace long
polling.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
This is a new—well not so new now—API, but fundamentally
it&#39;s an optimization over long polling and if WebSockets isn&#39;t available
you can always fall back to long polling.&lt;/p&gt;
&lt;h2 id=&quot;post-standardization&quot;&gt;Post-Standardization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#post-standardization&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Up to now I&#39;ve been focusing on how Web applications are built, but
now I want to zoom out and talk about the bigger picture.&lt;/p&gt;
&lt;p&gt;Traditionally, client-server applications relied on standardized
protocols. This means that there is some document which describes
what messages the client can send the server and how the server
will behave in response and vice versa. For instance, if you
are reading mail on your iPhone, you are probably using a standardized
protocol (likely &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internet_Message_Access_Protocol&amp;amp;oldid=1071482084&quot;&gt;IMAP&lt;/a&gt;)
to talk to the server. This is why the iOS mail client can talk
to any mail server; you just need to give it the address
of the server and your username and password. All of the
protocol machinery is built into the mail client, which knows
how to send email, download it, etc. It can show any UI
it wants but it needs to comply with the protocols.&lt;/p&gt;
&lt;p&gt;The Web is also built on standardized protocols, of course:
HTTP and TLS for interacting with the server, HTML and CSS for
formatting the page, JavaScript and Web service APIs for application
logic. These are all standardized, which is why—at
least most of the time—any Web site will work on any
browser. But these standards only define the application &lt;em&gt;infrastructure&lt;/em&gt;:
the actual Web application is a combination of logic on the
server (however that&#39;s implemented) and logic on the client
written in JavaScript. This has huge implications because
it means that the application author provides both the client
and the server and therefore doesn&#39;t need to coordinate
with anybody but themselves. That&#39;s why the world was able to
switch from applications that used XML for data transfer
to JSON for data transfer without changing the Web browser
at all.&lt;/p&gt;
&lt;p&gt;When the first real interactive Web applications using AJAX came
out, this was a truly revolutionary property.
After years of painstaking coordination defining every detail
of application protocol behavior, suddenly it was possible
to quickly build a complete client/server application without
talking to anyone. It had of course had always
been possible to define your own protocol and write a client and
server that spoke it, but getting people to download your client
was a huge obstacle; by contrast anybody could use your Web app
just by navigating to the right place. Moreover, the Web browser
included all kinds of powerful facilities—this is even more
true now—that you would have had to build (or at least download)
yourself.&lt;/p&gt;
&lt;p&gt;Of course, now it&#39;s 2022, 15 years after the introduction of the iPhone.
We have mobile app stores and the problem of software distribution—and
in particular updating—has gotten much easier, so on
mobile you can invent some proprietary protocol and roll out an
app and as long as people download it, you&#39;re good to go. If you
want to change the protocol, no problem, just update to a new
version. The Web is like this, but even moreso because users don&#39;t
need to install or update software: they just get whatever the new
thing is when they load your site. This lets vendors build a completely
vertically integrated system that leverages the power of the Web
platform but without having to standardize—or, often, even
document—anything.&lt;/p&gt;
&lt;p&gt;Obviously, this has real benefits in terms
of engineering velocity, but it&#39;s also contributed to a situation
in which the user experience of a site and its functionality
are completely entangled, so it&#39;s hard to use (say) Facebook
without the Facebook UI. If you don&#39;t like something about
that UI, you&#39;re basically out of luck.
And even if you did reverse engineer
the server-side APIs that Facebook used and write your own client,
there&#39;s no guarantee Facebook won&#39;t change those APIs tomorrow.
By contrast, if you want to use a different
mail client with mail that is hosted by Gmail, it&#39;s just a download away.&lt;/p&gt;
&lt;p&gt;This isn&#39;t to say that there isn&#39;t still plenty of work going into
creating standardized technologies for the Web. However, that work
is primarily concentrated on creating new plumbing (e.g., TLS 1.3 or
QUIC) or new Web platform features (e.g., WebRTC or Web Assembly).
This all makes the Web a better platform for running applications,
but the applications themselves live on top of that substrate
and are largely opaque and non-interoperable.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-origins%2C-and-the-same-origin-policy&quot;&gt;Next Up: Origins, and the Same Origin Policy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#next-up%3A-origins%2C-and-the-same-origin-policy&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point, we&#39;ve covered most of what you need to know about how
the Web works in order to understand its security model (and I&#39;ll
be introducing the rest as we go). In the next post, I&#39;ll be covering
the basic unit of Web security: the &lt;strong&gt;origin&lt;/strong&gt;.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that this actually says &lt;code&gt;ekr%40rtfm.com&lt;/code&gt;. This is what&#39;s
called &lt;em&gt;escaping&lt;/em&gt; of the @-sign. It&#39;s not really necessary here
but is done for consistency with cases where the address would
appear in the URL, where the @-sign is forbidden. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Not quite as slow as you might think because a lot of
the images and the like on the page can be cached, but still
slow. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Obviously standalone apps can and do use these APIs, but
the topic of these posts is the Web. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, I know that calling both of these APIs is confusing. I resisted calling
the HTTP APIs offered by servers &amp;quot;APIs&amp;quot; and then finally gave up. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;JSON &lt;em&gt;is&lt;/em&gt; modestly easier to work with, but like styles of jeans, data formats tend to cycle in and out of fashion. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For the nerds here, we also have the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Push_API&quot;&gt;Web Push API&lt;/a&gt;
which consolidates channels to multiple servers. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro2/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Understanding The Web Security Model, Part I: Web Publishing</title>
		<link href="https://educatedguesswork.org/posts/web-security-model-intro1/"/>
		<updated>2022-03-04T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/web-security-model-intro1/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;Note: This is one of those posts that is going to be best read on
the Web, especially if you read your email using GMail or the like,
as it will tend to mangle some of the HTML features.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Like many pieces of technology, the Web is one of those things that
people are perfectly happy to use but have absolutely no idea how it
works.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
It&#39;s natural to think of the Web as a publishing system, and
at some level it is: the Web lets people publish documents
for anyone to read. But what the Web really is is a distributed
computing platform that lets Web sites run code on your computer.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Originally, of course, that code just rendered documents, but
now it&#39;s used for everything from documents (like the one you&#39;re
reading now) to text-based applications like Slack or even
videoconferencing apps like Google Meet.
Unsurprisingly, then, the Web has a unique security model,
which is the topic of this series of (some unknown number of)
posts.&lt;/p&gt;
&lt;p&gt;I meant to start right in on security
but then I realized I first needed to provide enough background
of how the Web works to have the security stuff make sense.
This post is the first half of that background material,
covering the structure of Web sites and pages. There will
be a second post that covers
Web &amp;quot;applications&amp;quot;.
This isn&#39;t a textbook or a specification, so I don&#39;t intend
to provide a complete picture; the idea here is to cover the
essential elements for understand the security model.&lt;/p&gt;
&lt;h2 id=&quot;the-url&quot;&gt;The URL &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#the-url&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Everything on the Web starts with the &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=URL&amp;amp;oldid=1073943644&quot;&gt;Uniform Resource Locator
(URL)&lt;/a&gt;&lt;/em&gt;,
which, as Wikipedia puts it, is commonly called the &amp;quot;web
address&amp;quot;. Minimally, it&#39;s the thing that shows up in the address bar of your
browser when you go to a Web page, but actually everything on the Web has a URL,
not just web pages. For instance, most Web pages are made up of a
mix of text and images and each of those images has their own URL.
In fact, you can (usually) independently load each individual
subcomponent of the page by right-clicking on it, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/right-click.png&quot; alt=&quot;Right-click&quot; /&gt;&lt;/p&gt;
&lt;p&gt;What a URL really is is just the address of some thing (the
technical term here is &lt;em&gt;resource&lt;/em&gt;) on the
Web. Given the URL for a thing, your browser can go to the
indicated location (i.e., the Web server), load the resource,
and do something with it. What that something is depends on the
resource type and the context in which it&#39;s loaded, as we&#39;ll
see below. For instance, if the resource is an HTML document
or a PNG image, then the browser will try to display it.
If it&#39;s a zip file, the browser might try to save it to your
disk.&lt;/p&gt;
&lt;p&gt;A URL (at last for the Web) has three major parts, shown in the diagram below.
[Attention nitpickers: I&#39;ll get to query and fragment &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/query-and-fragment&quot;&gt;shortly&lt;/a&gt;.]&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/URL-structure.drawio.svg&quot; alt=&quot;URL Structure&quot; /&gt;&lt;/p&gt;
&lt;h3 id=&quot;scheme&quot;&gt;Scheme &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#scheme&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first part of the URL is what&#39;s called the &lt;em&gt;scheme&lt;/em&gt;, which indicates
the protocol that the client (the browser) should use to
access the resource. The Web itself has two important schemes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;http&lt;/code&gt;, which means to use the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Hypertext_Transfer_Protocol&amp;amp;oldid=1073936192&quot;&gt;Hypertext Transfer Protocol (HTTP)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https&lt;/code&gt;, which means to use HTTP with the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transport_Layer_Security&amp;amp;oldid=1074228735&quot;&gt;Transport Layer Security (TLS)&lt;/a&gt; secure transport protocol.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;schemes-and-protocols&quot;&gt;Schemes and Protocols &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#schemes-and-protocols&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In practice, the scheme doesn&#39;t refer to a single protocol but actually
to a family of protocols which have roughly the same externally visible
properties and
can be mutually negotiated. For instance, there are three main versions
of HTTP (HTTP 1.1, HTTP/2, and HTTP/3), all of which are fairly
different on the wire. Similarly, there are several different versions
of TLS. Finally, HTTP/3 doesn&#39;t run over TLS but
actually runs over the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=QUIC&amp;amp;oldid=1072475246&quot;&gt;QUIC&lt;/a&gt;
transport protocol which uses the TLS handshake for security. All
of these different protocols can be addressed with the same set of URLs, with
the browser and the server automatically selecting the right protocol.
This is actually an important requirement for seamlessly deploying new protocols:
for instance if HTTP/2 had required a new scheme it would have taken
much longer for it to be deployed, if ever, because everyone would have
had to change their pages.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are a huge number of &lt;a href=&quot;https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml&quot;&gt;registered schemes&lt;/a&gt;,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
but as a practical matter very few matter for the Web. When the Web
was young, there were a number of different information transfer protocols
and browsers used to support a number of other transports besides HTTP, such as the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=File_Transfer_Protocol&amp;amp;oldid=1071979454&quot;&gt;File Transfer Protocol (FTP)&lt;/a&gt; and the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Network_News_Transfer_Protocol&amp;amp;oldid=1071621299&quot;&gt;Network News Transfer Protocol (NNTP)&lt;/a&gt; and &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Gopher_(protocol)&amp;amp;oldid=1070835729&quot;&gt;Gopher&lt;/a&gt;.
However, as the information systems those protocols were associated with were subsumed
by the Web, HTTP became the dominant protocol and those protocols were
allowed to rot, and now HTTP(S) in its various versions is basically
the only game in town for transferring Web pages.&lt;/p&gt;
&lt;p&gt;There are, a few other URL schemes that matter on the Web for specialized
purposes, such as the &lt;code&gt;mailto&lt;/code&gt; scheme for indicating an email
address or the &lt;code&gt;turn&lt;/code&gt; scheme for indicating relays to be used
with the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Traversal_Using_Relays_around_NAT&amp;amp;oldid=1001583358&quot;&gt;TURN&lt;/a&gt;
protocol in WebRTC. These serve an important purpose, but aren&#39;t
really used as part of the main structure of the Web. These schemes
will often have a different structure than Web URLs, for
instance &lt;code&gt;mailto&lt;/code&gt; URLs look like &lt;code&gt;mailto:ekr@example.com&lt;/code&gt;,
but we don&#39;t need to worry about that for now.&lt;/p&gt;
&lt;h3 id=&quot;host&quot;&gt;Host &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#host&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The second piece of an HTTP/HTTPS URL is the &lt;em&gt;host&lt;/em&gt;, which is just
the name of the server hosting content. As discussed in &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/&quot;&gt;excruciating detail&lt;/a&gt;
in my &lt;a href=&quot;https://educatedguesswork.org/tags/dns/&quot;&gt;series on DNS&lt;/a&gt;, this
host name is resolved to an IP address via the DNS and the
browser then connects to that IP address. If the browser is
dereferencing an HTTPS URL, it will also expect that the
server present a certificate which has the hostname in it,
thus—at least in theory—demonstrating that the
browser is talking to the expected server.&lt;/p&gt;
&lt;h3 id=&quot;path&quot;&gt;Path &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#path&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The final piece of the URLs shown above is the &amp;quot;path&amp;quot; component, which
indicates the actual resource on the Web site which you are
accessing. The structure of this component is extremely server
specific. In theory, the server could just name
all of its resources &lt;code&gt;1&lt;/code&gt;, &lt;code&gt;2&lt;/code&gt;, etc. but
in practice, the path tends to somewhat mirror the
server&#39;s directory structure, with the &lt;code&gt;/&lt;/code&gt; separator
indicating directories on the server, etc., and this is what
common servers encourage.&lt;/p&gt;
&lt;p&gt;Even for more sites that are more like applications and
that don&#39;t really have directories of files, it&#39;s
conventional for paths to have a hierarchical structure
that mirrors the underlying information hierarchy. For
example GitHub URLs look like:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;https://github.com/[username]/[repository-name]/&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;with the list of issues at&lt;/p&gt;
&lt;p&gt;&lt;code&gt;/[username]/[repository-name]/issues/&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;and individual
issues at&lt;/p&gt;
&lt;p&gt;&lt;code&gt;/[username]/[repository-name]/issue/[issue-number]&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;query-and-fragment&quot;&gt;Query and Fragment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#query-and-fragment&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There are two other pieces of the URL that I didn&#39;t show above
but that are important to be aware of:&lt;/p&gt;
&lt;p&gt;&amp;quot;Query arguments&amp;quot; are a list of keyword-value pairs,
e.g.,&lt;/p&gt;
&lt;p&gt;&lt;code&gt;https://example.com/foo.html?foo=bar&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;These are automatically appended by the Web browser when the
user interacts with specific kinds of elements, such as &amp;quot;web forms&amp;quot;.
These will make an appearance later.&lt;/p&gt;
&lt;p&gt;&amp;quot;Fragments&amp;quot; allow the browser to refer to individual
portions of the page. For instance, the URL:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#query-and-fragment&quot;&gt;&lt;code&gt;https://educatedguesswork.org/posts/web-security-model-intro1/#query-and-fragment&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;goes to the section you are reading now. The key thing to know about the fragment
is that because it&#39;s used for intra-page navigation, it doesn&#39;t
get sent to the server, but is processed solely by the client.
Moreover, if you click on a fragment link on the same page
(you can try it with the link above), the browser will just
scroll to that point, but doesn&#39;t need to connect to the server
to reload the page.&lt;/p&gt;
&lt;h2 id=&quot;the-web-architecture&quot;&gt;The Web Architecture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#the-web-architecture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The diagram below shows the overall structure of a drastically
oversimplified Web application, on both the client and the server.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/overall-web.svg&quot; alt=&quot;Overall Web Diagram&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Even this simplified version is pretty complicated, so I&#39;ll
walk through it slowly.&lt;/p&gt;
&lt;p&gt;As you would expect from the above discussion, the process
starts with the URL, whether the user enters it directly,
clicks a bookmark, or clicks on a link. The browser then goes
to the server and requests that URL. In nearly every case
what&#39;s going to come back is a
&lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=HTML&amp;amp;oldid=1065260726&quot;&gt;HyperText Markup Language (HTML)&lt;/a&gt;&lt;/em&gt;
page.&lt;/p&gt;
&lt;h3 id=&quot;html&quot;&gt;HTML &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#html&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We don&#39;t need to go into HTML in too much detail, but at
a high level, HTML is &lt;em&gt;structured&lt;/em&gt; text. What this means
is that HTML is a text file that contains extra information
(&amp;quot;markup&amp;quot;) that tells the browser how to interpret it. As a simple
example, consider the following HTML fragment:&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;h4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;This is a header&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;h4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is some text with a hyperlink. &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;a&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;href&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;https://educatedguesswork.org/&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;hyperlink&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;a&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Just to orient yourself, HTML markup mostly consists of paired &amp;quot;start&amp;quot; and
&amp;quot;end&amp;quot; markers (&amp;quot;tags&amp;quot;) that indicate that the stuff in between them
is associated with the tag. If you have a tag &lt;code&gt;xx&lt;/code&gt; then the
start tag will be &lt;code&gt;&amp;lt;xx&amp;gt;&lt;/code&gt; and the end tag will be &lt;code&gt;&amp;lt;/xx&amp;gt;&lt;/code&gt;
and the stuff in between will be called the &amp;quot;xx element&amp;quot;.
Tags can also have attributes that get attached to the start,
like:&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;xx&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;attr1&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;abc&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which means &amp;quot;tag &lt;code&gt;xx&lt;/code&gt;
has attribute &lt;code&gt;attr1&lt;/code&gt; with the value &lt;code&gt;abc&lt;/code&gt;&amp;quot;.&lt;/p&gt;
&lt;p&gt;In this example, then, the &lt;code&gt;h4&lt;/code&gt; markers indicate that the text
inside them is a header (at header level 4) rather than body. The&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;a&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;href&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;https://educatedguesswork.org&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;block indicates that the text inside it is a hyperlink,
which just means that it&#39;s a section of text that contains the
text &amp;quot;hyperlink&amp;quot; and when you click on it it navigates the
browser the the page indicated by &lt;code&gt;https://educatedguesswork.org&lt;/code&gt;. This will get rendered something like this:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;h4&gt;This is a header&lt;/h4&gt;
&lt;p&gt;This is some text with a hyperlink. &lt;a href=&quot;https://educatedguesswork.org/&quot;&gt;hyperlink&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&#39;s important to recognize that this markup is (mostly) semantic.
Instead of telling the browser that the margins should be
size whatever, you&#39;re supposed to just provide the page structure
the text of the page and leave the browser to figure out
how to render it (though of course you should expect
to have reasonable margins, emphasized headers, etc.)
HTML does have some
basic &lt;a href=&quot;https://stackoverflow.com/questions/21949198/styling-html-text-without-css&quot;&gt;formatting stuff&lt;/a&gt;
like bold and italics, but it&#39;s quite limited and insufficient
for making the document look the way you really want;
with just HTML you&#39;re mostly
at the mercy of the browser&#39;s styling
decisions, with results that tend to be somewhat
less than satisfactory.&lt;/p&gt;
&lt;p&gt;HTML has a whole pile of other types of markup for things
like lists, tables, buttons, etc. We mostly don&#39;t need to
worry about these right now. What &lt;em&gt;is&lt;/em&gt; important, however,
is that HTML can also include tags that pull in other
resources from the site. For instance, you can have an
&lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag which loads an image off the site and renders
it at that place in the document, as in the following fragment,
which pulls in the diagram shown above. The &lt;code&gt;src&lt;/code&gt;
attribute is the place to load the image from.&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;img&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;/img/overall-web.svg&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Already this is pretty useful: you can use HTML to publish fairly rich
documents. In fact, this was pretty much all that was in the original
Web. However, it quickly became clear that people wanted to have more
control over sites. In particular, they wanted more control over
how things looked and they wanted to be able to add
arbitrary dynamic content that ran on the client.
In the Web, these needs are addressed by allowing the HTML
document to use two other kinds of resources that serve these
functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=CSS&amp;amp;oldid=1062097219&quot;&gt;Cascading Style Sheets (CSS)&lt;/a&gt;&lt;/em&gt;, which
allows you to tell the browser how to render your content.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=JavaScript&amp;amp;oldid=1064457161&quot;&gt;JavaScript (JS)&lt;/a&gt;&lt;/em&gt;, a general purpose programming language which, among other things, allows you to manipulate the HTML and CSS of the page.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s possible to embed the CSS and JS in the page directly,
but what&#39;s more common is actually to have HTML tags
which reference CSS and JS files on the server. So, what
happens in practice is that the HTML loads and then as the
browser parses it, it finds the tags for CSS, JS, as well
as images and the like and loads them all from the server
to assemble the correct page.&lt;/p&gt;
&lt;h3 id=&quot;css&quot;&gt;CSS &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#css&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I mentioned above, originally the Web mostly
had semantic markup, so you could say &amp;quot;this is  a header&amp;quot;
and some very limited styling (&amp;quot;&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTML/Element/font&quot;&gt;use this font&lt;/a&gt;)
but not &amp;quot;render this column with 20 pixel margin&amp;quot;. CSS allows you to apply styles to the
content of the page. As noted above CSS can be embedded in the HTML
(that&#39;s how the newsletter version of this site works)
but is commonly loaded off of separate resources, with the
HTML just pointing to the CSS. I don&#39;t intend to write too much
about CSS; while there are security and privacy issues around
CSS, most of Web security is concerned with other things.&lt;/p&gt;
&lt;h3 id=&quot;javascript&quot;&gt;JavaScript &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#javascript&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;HTML and CSS are pretty powerful all on their own if what you want
is a static Web site that publishes information. They also have
some limited interactive capability: for instance you can have
a web form where people can fill in information, click on radio
buttons, etc., and even send that data to the server which can
then act on it. But at the end of the day they&#39;re limited and lots
of applications require a general purpose programming language.
This is where JavaScript comes into the picture.&lt;/p&gt;
&lt;p&gt;JavaScript itself is just a regular programming
language at roughly the same level of abstraction as other &amp;quot;scripting&amp;quot;
languages like Python or Ruby. You can use JavaScript for anything you would use those
language for, though you might not want to. What makes JavaScript
special to the Web is two things (1) browsers know how to execute
it natively, which means if you send them JavaScript they will
run it; if you send them Python, they&#39;ll just display it to the
user or try to save it on disk&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
(2) the browser has special JavaScript
APIs that let the JavaScript code interact with the user and the
Web page.&lt;/p&gt;
&lt;h3 id=&quot;the-dom&quot;&gt;The DOM &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#the-dom&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;HTML, CSS, and JavaScript work together to produce the experience
you see on the Web via what&#39;s called
the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Document_Object_Model&amp;amp;oldid=1064348335&quot;&gt;Document Object Model (DOM)&lt;/a&gt;.
The way this works is that the browser parses the HTML provided by
the server into an abstract data structure that reflects the
structure of the underlying HTML.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
The DOM is then used to generate what you see on the screen.
Both CSS and JavaScript work by addressing the DOM. For instance, CSS
works by providing style information for certain elements of the DOM
(e.g., this paragraph) or certain types of elements (&amp;quot;all headers&amp;quot;)
(simplifying, remember!).&lt;/p&gt;
&lt;p&gt;JavaScript is much more powerful. First, it can manipulate the DOM
itself, by adding, removing, or changing elements. When changes
are made to the DOM, the browser will rerender the page, which means
that JavaScript can change what appears on the screen. This can
also have other side effects: for instance if JavaScript adds
a new &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag, that will cause the image to be loaded off
the server and displayed as part of the page. On unobvious
consequence of this is ability is that
because JavaScript is loaded into the page with HTML &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt;
tags, this means that one piece of JavaScript can load new pieces
of JavaScript by inserting new &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tags; it can do the
same for CSS as well of course. These turn out to be powerful
but also dangerous capabilities.&lt;/p&gt;
&lt;p&gt;In addition to manipulating the DOM, the browser has lots of other
APIs that let it interact with the network or the user. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Perform network requests to the server using &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API&quot;&gt;&lt;code&gt;fetch()&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Read from the camera and microphone using &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia&quot;&gt;&lt;code&gt;getUserMedia()&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Form peer-to-peer connections with other browsers using &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection&quot;&gt;&lt;code&gt;RTCPeerConnection&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One of the major ways in which the Web gets extended is by adding new
APIs; obviously JavaScript can do any computation that any
other language can do, but if you want to affect the outside
world, then you generally need some API to do it.&lt;/p&gt;
&lt;h2 id=&quot;the-server&quot;&gt;The Server &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#the-server&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This brings us to the Web server.
The most basic Web server just serves static files to the client: the
client sends a URL and the server sends back the corresponding
file. In the early days of the Web, the structure of the URLs as shown
in the path component would mirror the structure of the server&#39;s
filesystem.  For instance, you might have a server which stored files
in &lt;code&gt;/home/server/&lt;/code&gt;, in which case the URL
&lt;code&gt;https://example.com/abc/def.html&lt;/code&gt; would correspond to
&lt;code&gt;/home/server/abc/def.html&lt;/code&gt;. And those files themselves
would be Web pages or the other assets on them (like images).
But of course, over time, the world has gotten complicated.
This is still possible but of
course it&#39;s also possible for things to be a lot fancier.
In particular, instead of just serving static files the server
can perform computations and return the results to the client.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;h4 id=&quot;the-structure-of-web-servers&quot;&gt;The Structure of Web Servers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#the-structure-of-web-servers&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As I said, the original Web servers just served whatever
was on the file system to the client. But people quickly
realized that they wanted to be able to have the server
provide dynamic content. The original way to do this was
with something called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Common_Gateway_Interface&amp;amp;oldid=1072218950&quot;&gt;Common Gateway Interface (CGI)&lt;/a&gt;. The way CGI worked
was that you would have a special directory, by convention
called &lt;code&gt;/cgi-bin&lt;/code&gt; and instead of &lt;em&gt;serving&lt;/em&gt; the
files in that directory, the web server in would run
them and send the output the client. This wasn&#39;t that
efficient, but it got the job done. You&#39;ll still see it
in some places on the Web.&lt;/p&gt;
&lt;p&gt;More recently, it&#39;s become common to invert this structure
and have Web servers which handle essentially every request
programmatically. For instance, the popular
&lt;a href=&quot;https://expressjs.com/&quot;&gt;Express&lt;/a&gt; framework for &lt;a href=&quot;https://nodejs.org/en/&quot;&gt;Node.js&lt;/a&gt;
lets you register individual
functions to handle portions of the URL namespace.
These functions can just generate content directly or can use files as a template to generate the content based
on the file and some information the server has.
These servers can of course handle static files, but this is done by having
a special code module which then reads those static files
off the disk and then serves them.&lt;/p&gt;
&lt;p&gt;A common pattern is to serve the dynamic files off one server and static
files off another server, with each being specialized for its job.
This is an especially attractive pattern if the static files are
big and can be served off a fast &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Content_delivery_network&amp;amp;oldid=1074711717&quot;&gt;content delivery network (CDN)&lt;/a&gt; which is optimized for
that purpose. Of course, CDNs have now started to grow some
capabilities to handle dynamic content in what&#39;s called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Edge_computing&amp;amp;oldid=1073476548&quot;&gt;edge computing&lt;/a&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Obviously, the server can do any kind of computation it wants
to return answers, but there are a few major common types.&lt;/p&gt;
&lt;h3 id=&quot;templates&quot;&gt;Templates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#templates&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Suppose you want to send a more-or-less static page but you
want to customize it slightly. For instance, you might want
to put the user&#39;s username in the upper right hand corner
or add the number of times someone has viewed this page.
You could of course generate the whole page from scratch
on your server, but an easier way to do it is with a template.
Briefly, a template is a file containing HTML but with markers
that allow you to fill in variables. For instance, you might have:&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;h1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;Page title&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;h1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This page has been viewed [[num-views]] times.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;[[num-views]]&lt;/code&gt; means &amp;quot;replace this string with the
value of the &lt;code&gt;num-views&lt;/code&gt; variable.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
The idea here is that the server has a template processor
which is configured with a set of variables, in this case
the number of views. The processor reads the template, finds the template variable
markers, and replaces them with the corresponding values.
There are a lot of different template languages, some more
fancy than others, including &lt;a href=&quot;https://github.com/handlebars-lang/handlebars.js&quot;&gt;handlebars&lt;/a&gt;,
&lt;a href=&quot;https://mozilla.github.io/nunjucks/&quot;&gt;nunjucks&lt;/a&gt;,
&lt;a href=&quot;https://github.com/janl/mustache.js/&quot;&gt;mustache&lt;/a&gt;, etc.&lt;/p&gt;
&lt;h3 id=&quot;full-result-generation&quot;&gt;Full Result Generation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#full-result-generation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Suppose that instead most of your page is dynamic, like a news
site or a search engine result page. In that case, a template
doesn&#39;t really help you that much. Instead, you probably just
want to have your server assemble the whole page, piece
by piece (though probably from fragments of HTML stored
in the server software). This is basically the dual of
templates: templates are HTML (or markdown) with embedded
code. Page generation is code with embedded HTML.&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that the precise method that the
server uses to generate the page is largely invisible to the
client: it could be a static file, a template, fully
programmatic, or a mix of the above, with some pieces generated
one way and some another. The Web just defines the protocol
(i.e., the format of the page) and leaves the implementation
to generate that protocol however it wants. This is a very
important feature for allowing extensibility in the future.&lt;/p&gt;
&lt;h3 id=&quot;non-html-data-types&quot;&gt;Non-HTML Data Types &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#non-html-data-types&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Most of the text in this section sort of assumes that the server will
be returning HTML, but of course HTTP is an extensible protocol
and so you can transmit just about any content over HTTP.
And because the server can do arbitrary computations, this
means that it can return those results of the computation to the
client. We&#39;ll see how that&#39;s useful in the next post.&lt;/p&gt;
&lt;h2 id=&quot;cross-site-content&quot;&gt;Cross-Site Content &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#cross-site-content&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you were paying close attention before, you noticed that when you
load an image on a Web site, you provide a URL where the browser can
find the image. The same thing is true for other kinds of content,
whether it&#39;s audio, video, CSS, or JavaScript. That makes sense, after
all, because all that stuff was authored separately and you don&#39;t want
to have all that stuff crammed into one giant file on your server?
But who says that stuff has to be on &lt;em&gt;your&lt;/em&gt; server? The content is
being addressed by a URL and that URL can point &lt;strong&gt;anywhere&lt;/strong&gt;,
including some totally different Web server.&lt;/p&gt;
&lt;p&gt;Take for instance, this image of the Dogefox logo:&lt;/p&gt;
&lt;img src=&quot;https://i.redd.it/ldcju3p3w3x11.jpg&quot; alt=&quot;DogeFox&quot; width=&quot;400&quot; /&gt;
&lt;p&gt;Here&#39;s the HTML which loaded that:&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;img&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;https://i.redd.it/ldcju3p3w3x11.jpg&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;alt&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;DogeFox&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;400&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, the &lt;code&gt;src&lt;/code&gt; attribute, indicating where the image
comes from doesn&#39;t go to this site at all. It&#39;s pointing to a resource
on &lt;a href=&quot;https://www.reddit.com/&quot;&gt;Reddit&lt;/a&gt;—but I was able to just
load it into my site and unless you use the browser &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Tools&quot;&gt;developer tools&lt;/a&gt;
to look deeply, you wouldn&#39;t even notice. Importantly, the way
that this works is that the browser connects directly to the site
indicated in the URL; it doesn&#39;t go through the original server
at all (thought experiment: what happens if the server decides
to change the image?).&lt;/p&gt;
&lt;p&gt;You can do this kind of cross-site loading with pretty much anything,
including video, JavaScript and CSS.
This, for instance, is how you embed
YouTube videos in your site (you don&#39;t want to absorb the bandwidth
costs, right?).
The JavaScript thing is actually incredibly
common because people often want to make use of JavaScript libraries
but save bandwidth by serving them off their own server (because, as
above, it gets served directly). Of course, now your Web site
is incorporating an arbitrary program from someone else&#39;s server, so what could
possibly go wrong?&lt;/p&gt;
&lt;p&gt;This trick isn&#39;t limited to individual files either: you can actually load
a whole Web page this way, like so:&lt;/p&gt;
&lt;pre class=&quot;language-html&quot;&gt;&lt;code class=&quot;language-html&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;iframe&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;https://educatedguesswork.org/posts/&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;800&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;height&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;400&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;iframe&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This fragment pulls the archive page of this site into a frame on the
page, with scroll bars and everything:&lt;/p&gt;
&lt;iframe alt=&quot;[Framed version of the EG archive page should go here.]&quot; src=&quot;https://educatedguesswork.org/posts/&quot; width=&quot;800&quot; height=&quot;400&quot;&gt;&lt;/iframe&gt;
&lt;p&gt;This kind of mashup of cross-site content is one of the basic functions
of the Web and the source of all kinds of powerful functions, good
and bad, ranging from reusing open source content, to embedded maps and YouTube videos,
to Facebook like buttons and online ads (with their associated tracking).
It&#39;s an incredibly powerful feature and also one whose full implications
weren&#39;t really understood at the time it was introduced, using to some
exciting moments down the road.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-web-applications&quot;&gt;Next Up: Web Applications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#next-up%3A-web-applications&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point, we have the makings of a very fancy Internet-scale
publishing system, complete with cool styling, mashups, and even
a local programming language for producing cool effects.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
But as as I said at the top, the Web isn&#39;t just a publishing system,
and some of the most important parts of the Web (Facebook, Gmail,
Google Meet, Slack) act much more like applications than they do like
online publishing. But even though they have a lot more going on than say, this site, they use basically the same
primitives I&#39;ve introduced here, just in a number of new and interesting
ways (and with a number of exciting new security problems!).  In
the next (hopefully shorter) part of this series, I&#39;ll talk about how
those work.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Yes, I&#39;m quoting &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Amy_and_Amiability&amp;amp;oldid=1065207530&quot;&gt;Blackadder&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The Web actually isn&#39;t the first or only such platform;
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=PostScript&amp;amp;id=1061830440&amp;amp;wpFormIdentifier=titleform&quot;&gt;PostScript&lt;/a&gt; and &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=PDF&amp;amp;oldid=1066835350&quot;&gt;PDF&lt;/a&gt; documents are actually programs
that run on your printer or your computer. This provides a much more flexible
system than alternative designs like sending a static image to the printer. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The astute reader will note that the registry here talks about &lt;em&gt;URI&lt;/em&gt; rather than &lt;em&gt;URL&lt;/em&gt;
schemes, where the &lt;em&gt;I&lt;/em&gt; stands for &lt;em&gt;Identifier&lt;/em&gt;. URI is the generic term
with URLs being the subset of URIs which have enough information to dereference them
as opposed to just uniquely identifying something. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It is, of course, possible to run other languages on the
Web by first compiling them into JavaScript and then running
the JavaScript. For instance, &lt;a href=&quot;https://emscripten.org/&quot;&gt;Emscripten&lt;/a&gt;
is a tool that does this for C/C++ code. This works but is
a bit clunky. Eventually, there was
so much demand for this kind of thing that people designed a special
&amp;quot;low-level&amp;quot; language called &lt;a href=&quot;https://webassembly.org/&quot;&gt;WebAssembly&lt;/a&gt;
that browsers would run alongside JavaScript and that was
more appropriate as a compilation target for other languages. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technically, this is a set of nodes arranged in a tree structure.
So, for instance, you might have the root of the tree and then
paragraphs as children and within each paragraph, hyperlinks, etc. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In the context of graphics, this cycle of specialized
optimizations followed by the optimized system becoming
more generalized and then the generalized system undergoing
further specialized optimizations
is sometimes called the &lt;a href=&quot;http://www.cap-lore.com/Hardware/Wheel.html&quot;&gt;wheel of reincarnation&lt;/a&gt; (this name due to Ivan Sutherland) &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
More commonly the markers are curly braces, but if
I use curly braces here, the template processor which
renders this site will try to process it, so I&#39;m using
square brackets. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Basically,
&lt;a href=&quot;https://en.wikipedia.org/wiki/Project_Xanadu&quot;&gt;Xanadu&lt;/a&gt;
but built out of duct tape and cardboard. &lt;a href=&quot;https://educatedguesswork.org/posts/web-security-model-intro1/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Games, constraints, and the humanly possible</title>
		<link href="https://educatedguesswork.org/posts/games-and-the-possible/"/>
		<updated>2022-02-26T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/games-and-the-possible/</id>
		<content type="html">&lt;p&gt;On Friday&#39;s &lt;a href=&quot;https://www.nytimes.com/2022/02/25/opinion/ezra-klein-podcast-c-thi-nguyen.html&quot;&gt;Ezra Klein show&lt;/a&gt;,
Ezra interviews philosopher C. Thi Nguyen on the topic of games. Nguyen provides
an interesting definition of a game (btw, thanks to the Times for providing
&lt;a href=&quot;https://www.nytimes.com/2022/02/25/podcasts/transcript-ezra-klein-interviews-c-thi-nguyen.html&quot;&gt;transcripts&lt;/a&gt;
so I didn&#39;t have to type all this in):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What’s interesting about games for him [Bernard Suits —EKR] is that you have this thing—
the finish line—but it doesn’t count unless you did it under
specified constraints. It doesn’t count unless you follow a
particular path, unless you did it for a marathon on your own feet
instead of a bicycle or a taxi. And the fact that the activity would
lose its value if you didn’t do it in the specified, inefficient,
constrained way, that, for Suits, points the way to what games really
are.&lt;/p&gt;
&lt;p&gt;And the way I think of them sometimes, after Suits, is that games are
constraint-constituted activities. Does that make sense? That what it
is to run a race is to do it inside a certain set of
constraints. Like what it is to climb a rock in rock climbing is to
do it with your hands and feet and not a jetpack, or a chain, or a
helicopter. So whatever is valuable about games has to be in the fact
that they’re constructed struggles.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There&#39;s a lot here that&#39;s true. To take the example of the marathon,
not only is it rarely the case that running is the most efficient way
to get from point A to point B. In fact, it&#39;s not even the most efficient way allowed &lt;em&gt;in marathons&lt;/em&gt;.
Many major races have a wheelchair division and the wheelchair
athletes are much faster than the runners. For instance,
in the 2021 Chicago Marathon, the men&#39;s winner came through in
2:06:12 and the men&#39;s wheelchair winner came through in 1:29:07.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/games-and-the-possible/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
Moreover, plenty of marathons actually start and end in the same place
(and don&#39;t even get me started about 100 mile ultras run on a quarter
mile track).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/games-and-the-possible/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s interesting that Nguyen uses the example of rock climbing,
as mountaineering and rock climbing are both sports that started
out much less arbitrary than they are now and gradually became
more arbitrary and rule bound. Mountain climbing is perhaps the
purest example here: the tallest mountains are essentially
inaccessible by any means other than actually climbing them
on foot it&#39;s just barely possible to fly a helicopter to the
top of Everest, but as far as I know it&#39;s been done exactly &lt;a href=&quot;https://www.wearethemighty.com/mighty-culture/helicopter-to-summit-everest/&quot;&gt;once&lt;/a&gt;, so as a practical matter if you want to get to the top you
have to walk up.&lt;/p&gt;
&lt;p&gt;That doesn&#39;t mean that there aren&#39;t arbitrary rules, but the
interesting thing is how they have grown over time. Initially,
it was just a challenge to climb Everest at all and it took about
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mount_Everest&amp;amp;oldid=1072408813&quot;&gt;70 years of more-or-less serious attempts&lt;/a&gt;
before Tenzing Norgay and Edmund Hillary&#39;s first ascent 1953.
At the time, this was an incredible achievement and people
took any advantage they could get including supplemental
oxygen, teams of porters, etc. After a while, though
techniques developed and the mountain was better understood
and so people started to find ways to make it harder,
for instance by climbing without supplemental oxygen (Reinhold
Meissner and Peter Habeler in &lt;a href=&quot;https://www.planetmountain.com/en/news/alpinism/reinhold-messner-and-peter-habeler-40-years-ago-everest-without-supplementary-oxygen.html&quot;&gt;1978&lt;/a&gt;), solo, without oxygen (Meissner again in &lt;a href=&quot;https://www.adventure-journal.com/2020/08/40-years-ago-reinhold-messner-summited-everest-solo-without-bottled-oxygen/&quot;&gt;1980&lt;/a&gt;),
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Alpine_style&amp;amp;oldid=1024613451&quot;&gt;alpine style&lt;/a&gt;, etc.
Another complication is that there are different routes
mountains, some harder than others, so it might be
a challenge to do a new route even if you&#39;ve gotten to the
top before.
At this point, just getting to the summit by any means
necessary is difficult but doable by ordinary people
even without large amounts of mountaineering experience
(see Krakauer&#39;s &lt;a href=&quot;https://www.amazon.com/Into-Thin-Air-Personal-Disaster/dp/0385494785&quot;&gt;Into Thin Air&lt;/a&gt;
for more on this).&lt;/p&gt;
&lt;p&gt;The story is similar with rock climbing: the first ascents
of a number of the big wall climbs like &lt;a href=&quot;https://www.adirondackexplorer.org/outtakes/royal-robbins-first-ascent-half-dome&quot;&gt;Half Dome&lt;/a&gt;
or &lt;a href=&quot;https://gripped.com/profiles/history-of-free-climbing-the-nose-5-14-on-el-capitan/&quot;&gt;El Capitan were&lt;/a&gt;
were done &amp;quot;aided&amp;quot; which means that you use your protection
(back in those days, this meant bolts and pitons) for
support. Here too, initially it was a challenge just to get
to the top,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/games-and-the-possible/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
but after a while it became clear that if you
were willing to spend enough time and drill enough
bolts you could get up just about anything and so
people started thinking about free climbing (using
ropes for safety but not support) (El Capitan&#39;s
Salathe Wall by Skinner and Piana in 1988 and
The Nose by Lynn Hill in 1993), or free soloing
(no rope) (Alex Honnold up Freerider in &lt;a href=&quot;https://films.nationalgeographic.com/free-solo&quot;&gt;2017&lt;/a&gt;).
Here too, this is a story of technology (primarily sticky
rubber shoes and better mechanisms for attaching your protection
to rocks) and better technique.&lt;/p&gt;
&lt;p&gt;Under the definition being offered by
Nguyen—and as I understand it, Suits—when the first
people went up Everest it wasn&#39;t a game, but as soon as it became
relatively achievable by ordinary people and the challenge became
to handicap yourself by doing it without oxygen, then it became a game. This might
be right, but on the other hand it seems to me to
that Tenzing Norgay and Edmund Hillary&#39;s first ascent in 1953 and Meissner&#39;s
1980 solo ascent without oxygen are a lot more similar than they
are different in a way that Nguyen&#39;s definition tends to erase.
You could of course respond that the original first ascent was
a game—after all, isn&#39;t Everest arbitrary?—but then I
think you&#39;ve just redefined almost any challenge to be a game.&lt;/p&gt;
&lt;p&gt;I think that the common thread between all these challenges
is something Nguyen hints
at later, which is that games can be designed to be just difficult enough that
you can do them, but only barely:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;But in games, because the game designer manipulates what you want to
do and the abilities and the obstacles, the game designer can create
harmonious action. They can create these possibilities where you’re—
what you need to do— the obstacles you face and your abilities just
match perfectly.&lt;/p&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;p&gt;And in games, for once in your life, you know exactly what you’re
doing and you know exactly that you can do it. And then you have
just the right amount of ability to do it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This feels a lot closer to me as a description of the essence of the
kind of challenge that mountain climbing or running a two hour marathon
presents, namely that they are at the very limit of human capability.
When people first tried to climb Everest or El Capitan (or the moon!),
nobody knew if it was possible, so the challenge was just to
do it at all. But then once it was achieved, then
the limit of capability shifted and people wanted something harder,
which could either mean trying something
harder like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=K2&amp;amp;oldid=1072015801&quot;&gt;K2&lt;/a&gt;
(or Mars!) or adding new constraints to make it harder, like climbing
without oxygen.&lt;/p&gt;
&lt;p&gt;What I&#39;m saying is that the core experience here is doing
something that is just barely possible for you. Of course at some
level, &amp;quot;something&amp;quot; is arbitrary and once you&#39;ve run a marathon
&amp;quot;just barely possible&amp;quot; can be &amp;quot;do it slightly faster&amp;quot; but humans like things that feel like
natural anchor points even if they are ultimately arbitrary, hence the
appeal of the 40 minute 10K or climbing 5.12 for the amateur or the
four minute mile or 2 hour marathon for the professional.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/games-and-the-possible/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
I think
this is also behind the appeal of climbing without oxygen, in that
it feels like a clear dividing line.
From this angle, the nice thing about games is that the games designer gets
to set the conditions so that they are at the right level,
but those arbitrary tuning parameters are buried inside the rules of the game
so that finishing the game becomes a concrete anchor that people can focus on.&lt;/p&gt;
&lt;p&gt;Of course, this is all easier said than done, especially if you want
everyone to do the same task. Human capabilities vary widely and
a challenge that is just barely at the limit of someone&#39;s capabilities
(say running 100 miles) is easy for others.
This is something that Gary Cantrell, the creator of the &lt;a href=&quot;https://www.justwatch.com/us/movie/the-barkley-marathons-the-race-that-eats-its-young&quot;&gt;Barkley Marathons&lt;/a&gt; talks about, namely that it&#39;s easy
to make a race that&#39;s so hard that nobody can do, but what&#39;s
hard is making a race that &lt;em&gt;almost&lt;/em&gt; nobody can do. But of course
that&#39;s exactly what makes people want to attempt it.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Conversely, race walking is a sport where you have to
propel yourself on your own two legs, but you&#39;re not
allowed to run. This is &lt;a href=&quot;https://www.sbnation.com/2016/8/20/12566066/50km-race-walk-olympics-event-pain&quot;&gt;arguably harder&lt;/a&gt; than running because
you&#39;re walking above the speed where the most efficient
thing to do would be to run (around 5mph). &lt;a href=&quot;https://educatedguesswork.org/posts/games-and-the-possible/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, I&#39;m happy to do long distance multi-day
backpacking trips but I don&#39;t like day hiking. If
I&#39;m going to end up the same place I started, I&#39;d
just as soon run. &lt;a href=&quot;https://educatedguesswork.org/posts/games-and-the-possible/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I should mention at this point that unlike Everest,
you can hike to the top of Half Dome and El Capitan,
though the &lt;a href=&quot;https://www.nps.gov/yose/planyourvisit/halfdome.htm&quot;&gt;Half Dome hike&lt;/a&gt;
depends on a set of cables
put up by the park service. &lt;a href=&quot;https://educatedguesswork.org/posts/games-and-the-possible/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note how each of these is tied to some set of basically arbitrary
units of time, distance, or difficulty. Of course, there
are challenges that aren&#39;t tied to some arbitrary number,
like bench pressing your own weight.
 &lt;a href=&quot;https://educatedguesswork.org/posts/games-and-the-possible/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Risks (or non-risks) of scanning QR codes</title>
		<link href="https://educatedguesswork.org/posts/qr-code-security/"/>
		<updated>2022-02-20T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/qr-code-security/</id>
		<content type="html">&lt;p&gt;I did not watch the Super Bowl but it seems Coinbase bought a super bowl
ad that consisted of a &lt;a href=&quot;https://youtu.be/09A_BzRcME8&quot;&gt;QR code floating around your screen&lt;/a&gt;.
Honestly, I find it kind of soothing—not that I own any cryptocurrency—but the Internet got upset:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Scanning an unidentified QR code that bounces across your screen during the Super Bowl is like going around at the end of a party finishing all the half empty drinks. You can do it, but you&#39;ll regret it. And you&#39;ll get a lip fungus. But for your computer. It&#39;s a whole thing.&lt;/p&gt;
&lt;p&gt;— Evan Greer (@evan_greer) &lt;a href=&quot;https://twitter.com/evan_greer/status/1493014790976978945&quot;&gt;2022-02-13&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;I am once again reminding you that scanning random QR codes is upsettingly close to plugging a random flash drive you found into your laptop.&lt;/p&gt;
&lt;p&gt;Do not do the thing.&lt;/p&gt;
&lt;p&gt;— Techni-Calli (@iwillleavenow) &lt;a href=&quot;https://twitter.com/iwillleavenow/status/1493101604374925312?s=21&quot;&gt;2022-02-13&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;5 years from now, news will come out that Coinbase’s QR code was the source of the biggest data breach in US history.&lt;/p&gt;
&lt;p&gt;— Aaron Parnass (@AaronParnass) &lt;a href=&quot;https://twitter.com/AaronParnas/status/1493018442118610945?ref_src=twsrc%5Etfw&quot;&gt;2022-02-13&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;See also this longer &lt;a href=&quot;https://www.computer.org/publications/tech-news/trends/qr-code-risks&quot;&gt;writeup&lt;/a&gt;
on the topic by Iam Waqas that predates the Super Bowl, this
&lt;a href=&quot;https://www.secureworld.io/industry-news/qr-code-controversy-super-bowl&quot;&gt;SecureWorld&lt;/a&gt;
post, etc.&lt;/p&gt;
&lt;p&gt;I wasn&#39;t planning on clicking on that QR code, but I&#39;m also rather less
worried about it than others. This post explains why, but first we need to have a clear sense
of what&#39;s going on.
As I &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus&quot;&gt;explained earlier&lt;/a&gt;, a &lt;a href=&quot;https://en.wikipedia.org/wiki/QR_code&quot;&gt;QR code&lt;/a&gt;
is just a way of encoding digital information. The QR reader on your
device then decodes the QR code into a string of bytes and tries
to figure out what to do with those bytes.
Interestingly, there &lt;a href=&quot;https://github.com/zxing/zxing/wiki/Barcode-Contents&quot;&gt;doesn&#39;t seem&lt;/a&gt; to be any
really standardized meta-information telling
you what the type of the data is, so typically your device
tried to infer it from the first bytes. For instance, if those
bytes are  &lt;code&gt;http://&lt;/code&gt; or &lt;code&gt;https://&lt;/code&gt; in front of it
then it&#39;s presumably a Web address (the technical term here is a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=URL&amp;amp;oldid=1015459310&quot;&gt;URL&lt;/a&gt;).
But it really could be any data and hopefully your device
infers what it is correctly.&lt;/p&gt;
&lt;p&gt;This situation presents a number of potential security risks
(see &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus&quot;&gt;here&lt;/a&gt; for discussion of the privacy risks).&lt;/p&gt;
&lt;h2 id=&quot;remote-compromise&quot;&gt;Remote Compromise &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qr-code-security/#remote-compromise&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Probably the attack that most people have in mind when they think of
the potential dangers of QR codes is that that will result in your
computer being compromised. From Iam Waqas in &lt;a href=&quot;https://www.computer.org/publications/tech-news/trends/qr-code-risks&quot;&gt;IEEE Computer&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Cybercriminals might embed malicious URLs in publicly present QR
codes so that anyone who scans them gets infected by malware. At
times merely visiting the website might trigger the downloading of
malware silently in the background. Apart from that, they might also
send phishing emails containing QR codes that again infect the
user’s device with malware when scanned.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is of course possible, but I don&#39;t think a QR code presents a
particularly high risk compared to the usual risks you take.
At a high level, the QR code could result in your computer being
compromised in three basic ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The QR code could take you to a Web site that attacks your
browser.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The QR code could take you to a Web site that prompts you
to install some malicious software.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The QR code reader on your computer/device could itself have
a vulnerability that enables compromise.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&#39;s put (2) aside here for a minute, because while it&#39;s a real attack, it
really belongs with &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-security/#phishing&quot;&gt;phishing&lt;/a&gt;, which I cover below; this
leaves us with attacking the QR code reader and malicious Web sites.&lt;/p&gt;
&lt;h3 id=&quot;the-qr-code-reader&quot;&gt;The QR Code Reader &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qr-code-security/#the-qr-code-reader&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It&#39;s certainly not out of the question that the QR code reader—whether the one built into the device or the one in your
browser—could have some kind of vulnerability, as bugs in image
processing code are reasonably common. For example, NSO&#39;s iMessage
exploit took advantage of a vulnerability in the iOS PDF reader (see
this excellent
&lt;a href=&quot;https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html&quot;&gt;writeup&lt;/a&gt;
by Ian Beer and Samuel Groß of Google Project Zero).
With that said, a vulnerability like this in the QR code reader
would be pretty serious, given that people scan untrusted QR codes all
the time and aren&#39;t going to stop.&lt;/p&gt;
&lt;p&gt;This isn&#39;t to say they don&#39;t exist: this
&lt;a href=&quot;https://topic.alibabacloud.com/a/qr-code-vulneratbility-attacks-on-android-platforms_3_75_32779897.html&quot;&gt;article&lt;/a&gt;
what seem like some legitimate memory vulnerabilities in the
Android QR code reader back in 2015.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-security/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
As far as I can tell, the last serious
vulnerability in a QR code reader on a major device
operating system was actually in the
&lt;a href=&quot;https://www.intego.com/mac-security-blog/ios-11s-camera-app-has-a-qr-code-vulnerability/&quot;&gt;URL parser in iOS 11&lt;/a&gt;.
This isn&#39;t good but shouldn&#39;t lead to device compromise.&lt;/p&gt;
&lt;p&gt;Note that these comments  mostly apply to the QR code reader
that is built into your device or your browser. I generally
would not assume that a random QR code reader app is safe
to use to read arbitrary QR codes. And of course in at least
one case a QR code scanner contained malware
&lt;a href=&quot;https://blog.malwarebytes.com/android/2021/02/barcode-scanner-app-on-google-play-infects-10-million-users-with-one-update/&quot;&gt;itself&lt;/a&gt;.
However, it &lt;em&gt;would&lt;/em&gt; be a big deal if
the QR code reader built into your phone OS were insecure.&lt;/p&gt;
&lt;h3 id=&quot;compromise-of-the-browser&quot;&gt;Compromise of the Browser &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qr-code-security/#compromise-of-the-browser&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This brings me to the second major avenue for remote compromise: the
browser. In this case what&#39;s happening is that the QR
code contains the address of some Web site and reading the QR code
navigates your browser to that site, and presumably that
site would then attack your computer. This situation isn&#39;t conceptually any
different from you just typing in the site address yourself:
the end result is you end up at a specific Web site that
was indicated by the QR code.&lt;/p&gt;
&lt;p&gt;One point that is often made in this situation is that it&#39;s hard
to know what Web site you will end up at because the QR code is
unreadable by humans. This is true, but, I think, largely misplaced, for
three reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It&#39;s common for QR code readers to show you the URL they
are going to, so it&#39;s not opaque. Indeed, the iOS exploit
I mentioned above was designed to circumvent that feature.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You don&#39;t need a QR code to send someone an opaque URL:
You can just use a URL shortener like
&lt;a href=&quot;https://bitly.com/&quot;&gt;bit.ly&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Going to arbitrary URLs shouldn&#39;t be a problem anyway.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This first two of these reasons should be straightforward, but the
last needs some unpacking. The point here is that it&#39;s the browser&#39;s
job to protect you even from malicious site (indeed, &lt;em&gt;especially&lt;/em&gt;
from malicious sites). In fact, in a
&lt;a href=&quot;https://ptolemy.berkeley.edu/projects/truststc/pubs/840/websocket.pdf&quot;&gt;paper&lt;/a&gt;
with Lin-Shung Huang, Eric Chen, Adam Barth, Collin Jackson, we
described it as the &amp;quot;core security guarantee&amp;quot; of the Web:
&lt;strong&gt;users can safely visit arbitrary web sites and execute scripts provided by
those sites&lt;/strong&gt;. The browser does this by isolating the content provided
by the site so that it (hopefully) can&#39;t endanger your computer.
Of course, browsers do have vulnerabilities that can result
in remote compromise, but these are very serious defects
that are worth &lt;a href=&quot;https://www.zerodayinitiative.com/blog/2022/1/12/pwn2own-vancouver-2022-luanch#browser&quot;&gt;real money&lt;/a&gt;:
a remote compromise of a live web browser is worth $100K or more.
If you have such a vulnerability, there are probably better
things to do with it than hack random Super Bowl watchers,
especially given that that&#39;s hardly an anonymous or stealthy
way to deliver your payload.&lt;/p&gt;
&lt;p&gt;Even if we assume that you have a zero-day like this and you&#39;re
willing to waste it in an on attack on basically random people, there
are easier ways to accomplish that. For instance, you could
serve up your attack via a Web advertising campaign; this would
even let you target your victims to some extent, especially if
you&#39;re willing to pay. Indeed, it&#39;s precisely because it&#39;s
so easy to get a large number of people to load content from your site&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-security/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
that it&#39;s so important that browsers be safe when run against
arbitrary sites.&lt;/p&gt;
&lt;h2 id=&quot;phishing&quot;&gt;Phishing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qr-code-security/#phishing&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Probably the more serious risk here is
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Phishing&amp;amp;oldid=1072865920&quot;&gt;phishing&lt;/a&gt;. As
with any phishing attack, phishing via QR code relies on you thinking
that you are going to a site operated by someone legitimate when it&#39;s
actually operated by the attacker. How serious this attack turns out
to be depends on how much you trusted the person you thought you were
connecting to in the first place.&lt;/p&gt;
&lt;p&gt;In this case, for instance, you&#39;re theoretically connecting to
Coinbase and the attacker might try to prompt you for your
credit card and banking information or, if you&#39;re a Coinbase
customer, for your Coinbase credentials (&lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/&quot;&gt;use a password manager, people&lt;/a&gt;). Obviously, you need to be careful here, but again,
the situation isn&#39;t any different than if the attacker
had provided a short URL; in both cases you enter something
opaque and you end up at a site with a domain you may
or may not recognize. Or, for that matter, the attacker
might send you to a domain that looks plausible
but is not run by who you think it is. For example,
&lt;a href=&quot;http://coinba.se/&quot;&gt;http://coinba.se&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-security/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; does not take you where you might
expect.&lt;/p&gt;
&lt;p&gt;One interesting recent example of QR-code based phishing attacks
if phishers putting fake QR codes on &lt;a href=&quot;https://www.msn.com/en-us/news/technology/scammers-are-putting-qr-code-stickers-on-parking-meters-to-trick-people-into-paying-them/ar-AASJGke&quot;&gt;parking meters&lt;/a&gt;.
The victim thinks they are paying to park but really they
are paying the scammer. This attack seems like it&#39;s slightly
facilitated by QR codes but mostly it&#39;s facilitated by
using your phone to pay a parking meter. It&#39;s not as if
the actual site you go to pay for parking necessarily has
a particularly credible looking name anyway, so it&#39;s not clear
how much better the situation would be if you had to type
in a URL rather than a QR code (though obviously it would be
less convenient.)&lt;/p&gt;
&lt;p&gt;Browsers do try to protect users from this kind of attack using
blocklists like &lt;a href=&quot;https://safebrowsing.google.com/&quot;&gt;Safe Browsing&lt;/a&gt;.
This actually seems like a case where blocklist techniques are
likely to be fairly effective because the time scale of attack
is fairly long—the stickers take a long time
to deploy and people are fooled over a period of days—which
gives the blocklist provider time to detect the attack and
mitigate it. By contrast, ordinary phishing attacks (e.g., by
email) can use short-lived domains and so be hard to block
before they do damage.&lt;/p&gt;
&lt;h2 id=&quot;consider-the-source&quot;&gt;Consider the Source &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qr-code-security/#consider-the-source&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The final reason I&#39;m not too worried about the Super Bowl ad per
se is that it&#39;s expensive and easily attributable. A 30 second
Super Bowl ad cost &lt;a href=&quot;https://www.cnn.com/2022/02/11/media/super-bowl-commercials-nbc/index.html&quot;&gt;as much as $7 million&lt;/a&gt;,
so you&#39;d have to be a pretty dedicated attacker to use that
airtime to deploy your malicious QR code. Moreover, it&#39;s hard
to buy that kind of thing anonymously, so when people inevitably
discover that the QR code is malicious, the attacker is likely
to be looking at some pretty serious law enforcement action.&lt;/p&gt;
&lt;p&gt;I&#39;ve seen it &lt;a href=&quot;https://www.secureworld.io/industry-news/qr-code-controversy-super-bowl&quot;&gt;suggested&lt;/a&gt;
that a more interesting threat vector is reposts on YouTube and the
like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;The real risk in this situation is if someone edits the commercial and adds a malicious QR code to it, especially on social media platforms.&lt;/p&gt;
&lt;p&gt;People will repost Super Bowl ads for weeks after the game itself, so an attacker could easily change the QR code. The ad could be reposted across social media apps and crypto forums to get people to visit a malicious webpage. That page could be a fake Coinbase login site. If this was a success, the victim could end up having their entire account drained. Attackers could also build that page to deliver a trojanized version of a crypto app.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This does seem like a potential risk, though hopefully most
of the major venues for finding the Coinbase ad will actually
get the right QR code. Here too, time is on your side and so
even if someone does post a fake YouTube video, hopefully
YouTube would be able to take it down fairly quickly.&lt;/p&gt;
&lt;p&gt;I&#39;m not saying that you should trust that a random
QR code that claims to be for your bank actually is legitimate
any more than you should trust a random email that claims
to be from your bank. However, this just doesn&#39;t seem like
a particularly efficient mechanism for attack delivery.
The parking meter case is interesting precisely because
(1) the user may have no real previous association with the
service provider and so it&#39;s hard for them to know if it&#39;s
legitimate and (2) the user already has an intent to pay—and
is probably in a hurry—so even a very small success rate is
likely to be worth the effort of going around sticking stickers
on parking meters. The situation for Super Bowl ads seems
pretty different.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qr-code-security/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I&#39;m open to being wrong here, but from what I&#39;ve seen so far, I&#39;m
just not that concerned about this particular threat. However, even if
you disagree with me, we have to deal with the fact that users probably
aren&#39;t going to stop scanning QR codes whatever we tell them; it&#39;s up to
operating system and browser vendors to make that as safe as we can
and/or to offer alternatives that are safer and equally convenient.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This writeup also describes some attacks where you insert
JS in the QR code and it gets executed by the client.
Those attacks seem to rely on the QR code data being
treated as a &lt;code&gt;file://&lt;/code&gt; URL and same origin
to other &lt;code&gt;file://&lt;/code&gt; URLs, which is something
that browsers are &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1500453&quot;&gt;moving away from&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-security/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that in the ads case they&#39;ll be loading that data
in an IFRAME, but this probably won&#39;t make a difference
to attack effectiveness. &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-security/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is no
HTTPS version. &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-security/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Overview of Interoperable Private Attribution</title>
		<link href="https://educatedguesswork.org/posts/ipa-overview/"/>
		<updated>2022-02-15T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ipa-overview/</id>
		<content type="html">&lt;style&gt;
.img-wrap {
  display: inline-block;
}
.img-wrap img {
  width: 80%;
}&lt;/style&gt;
&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: this post contains a bunch of LaTeX math notation rendered
in MathJax, but it doesn&#39;t show up right in the newsletter
verison.  You should mostly be able to follow along anyway
except for the &amp;quot;Technical Details&amp;quot; section and the Appendix (which
is part of why it&#39;s an appendix) so you may want to
instead read the version on the &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview&quot;&gt;site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Recently, Erik Taubeneck (Meta), Ben Savage (Meta), and Martin Thomson
(Mozilla) recently published a new technique for measuring the effectiveness
of online ads called
&lt;a href=&quot;https://docs.google.com/document/d/1KpdSKD8-Rn0bWPTu4UtK54ks0yv2j22pA5SrAD9av4s/edit&quot;&gt;Interoperable Private Attribution
(IPA)&lt;/a&gt;.
This has received a fair amount of attention—including some not
so positive &lt;a href=&quot;https://news.ycombinator.com/item?id=30305770&quot;&gt;comments on Hacker
News&lt;/a&gt;. I&#39;ve written
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking&quot;&gt;before&lt;/a&gt; about how to use a variant of this
technology to measure vaccine doses, but I thought it would be useful
to walk through how IPA works in its intended setting.&lt;/p&gt;
&lt;h2 id=&quot;attribution-and-conversion-measurement&quot;&gt;Attribution and Conversion Measurement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ipa-overview/#attribution-and-conversion-measurement&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For obvious reasons, advertisers and publishers want to know how effective their ads
are. The basic tool for this is what&#39;s called &amp;quot;attribution&amp;quot; or
&amp;quot;conversion measurement&amp;quot;, Suppose I see an ad for a product on a news
site and click on it, taking me to the merchant, where I subsequently
make a purchase. This is called a &lt;em&gt;conversion&lt;/em&gt;, and advertisers
want to know which ads convert—and how often—and
which ones do not.&lt;/p&gt;
&lt;p&gt;At the moment, conversion measurement is mostly done with cookies,
as shown in the figure below:&lt;/p&gt;
&lt;div class=&quot;img-wrap&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/conversion-cookies.png&quot; alt=&quot;Conversion with cookies&quot; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&#39;s walk through this in pieces. First, the client visits the
publisher site. The publisher serves the client a Web page
with an &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe&quot;&gt;IFRAME&lt;/a&gt;
from the advertiser&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
(reminder: an IFRAME is HTML element that allows one a Web page to
display inside another Web page, even from two different sites).
When the advertiser sends the page, it also sends a tracking
cookie to the client, in this case &lt;code&gt;1234&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The user views the ad (an &lt;em&gt;impression&lt;/em&gt;) and clicks through, which takes them
to the  merchant. In this case, they just make
an immediate purchase, but they might also shop around on the
site or even go away and come back later.
Eventually, the user makes a &lt;em&gt;purchase&lt;/em&gt; (&amp;quot;converts&amp;quot;). When the merchant
sends the confirmation page it includes a tracking pixel
(an invisible image) served off of the advertiser&#39;s site.
When the browser retrieves the pixel, it sends the advertiser&#39;s
cookie (&lt;code&gt;1234&lt;/code&gt;) back to the advertiser. The cookie allows the
advertiser to connect
the original click and the resulting purchase, thus measuring the
conversion.&lt;/p&gt;
&lt;p&gt;You&#39;ll note that what&#39;s technically being measured in this
example is the conversion from the impression to the
purchase. If you wanted to measure the click instead,
there are a number of ways to do this, such as having the ad click
redirect through the advertiser or having a Javascript
hook that informed the advertiser of the click.&lt;/p&gt;
&lt;p&gt;The problem with this technique is that it involves
the advertiser tracking you across the Internet: it sees
which Web site you are on every time it shows you an ad,
and for a big ad network this can be a pretty appreciable
fraction of your browsing history.
This is a serious privacy problem and browsers are gradually
deploying techniques to prevent this kind of tracking,
such as Firefox&#39;s &lt;a href=&quot;https://support.mozilla.org/en-US/kb/enhanced-tracking-protection-firefox-desktop&quot;&gt;Enhanced Tracking Protection&lt;/a&gt;
and Safari&#39;s &lt;a href=&quot;https://webkit.org/blog/9521/intelligent-tracking-prevention-2-3/&quot;&gt;Intelligent Tracking Protection&lt;/a&gt;.
Those technologies are good for user privacy but
interfere with conversion measurement.
IPA is a mechanism designed to provide conversion
measurement without degrading user privacy.&lt;/p&gt;
&lt;h2 id=&quot;the-basic-idea&quot;&gt;The Basic Idea &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ipa-overview/#the-basic-idea&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The main idea behind IPA is to replace cookie-based linkage with
linkage based on an anonymous identifier. Let&#39;s assume that each client $i$
has a single unique identifier $I_i$ (I&#39;ll discuss how this identifier is
assigned below). This identifier can&#39;t be read directly
off the client but instead has to be accessed via an API
e.g., &lt;code&gt;getIPAEvent()&lt;/code&gt; that produces an
encrypted version of the identifier $E(I_i)$.
The encryption is &lt;em&gt;randomized&lt;/em&gt; so that each time the identifier is encrypted, the ciphertext is different,
preventing linkage of the encrypted identifiers. To represent that,
we use the notation $E(R_j, I_i)$ where $R_j$ is the randomizing
value. Two encrypted values $E(R_j, I_i)$ and $E(R_{j&#39;}, I_{i&#39;})$ will with high
probability be different unless both the identifier and the randomizer
are the same.
However, by use of an appropriate service they can be decrypted and matched up.&lt;/p&gt;
&lt;p&gt;If we go back to the conversion scenario described above, but instead
use IPA, it would look like this:&lt;/p&gt;
&lt;div class=&quot;img-wrap&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/conversion-ipa.png&quot; alt=&quot;Conversion measurement with IPA&quot; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Everything is the same up to the point where the ad is displayed,
except that along with the ad the advertiser also sends some
Javascript code that calls &lt;code&gt;getIPAEvent()&lt;/code&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;. The browser responds
by providing an encrypted version of the identifier, with
random value $R_1$: $E(R_1, I_i)$. The advertiser just stores
this information on a list of the impressions for this
ad (note that as before we are measuring impressions).&lt;/p&gt;
&lt;p&gt;When the user actually buys the product, the merchant calls &lt;code&gt;getIPAEvent()&lt;/code&gt;
and gets a new encrypted version of the identifier, this time with
a different randomizer,
$R_2$:
$E(R_2, I_i)$. The merchant sends the encrypted value it receives
to the advertiser. However, even though the identifiers are
the same, because the randomizers are different, the encrypted
values are different, thus preventing either the advertiser or the merchant from linking
them. The only thing that the advertiser knows is that there
has been one impression (because it saw it directly) and one
purchase (because the merchant told it about it). It&#39;s important
to note that this is all information that the merchant and the ad
server knew already: the only secret information is the identifier
and that&#39;s encrypted. In order to decrypt it and match up these
events, you need to use the IPA decryption and blinding service.&lt;/p&gt;
&lt;p&gt;The basic idea behind the service is that the advertiser (or merchant)
has a set of encrypted identifiers that it sends to the service
and the service returns information about the number of matches.
So, for instance, you might send in 20 encrypted identifiers
and get back something like:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Unmatched impressions&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Unmatched purchases&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Impression/purchase pairs&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Two impressions/one purchase&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note: it&#39;s important that the IPA service only operate on
batches of reports and produce aggregate reports about the batch;
otherwise the advertiser could just send in small numbers of
reports at a time. More on this &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#privacy-properties&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Internally, the service works by having a pair of servers
which cooperate to decrypt and blind the input values.
The advertiser (or merchant) sends its values to the first
server, which decrypts, blinds, and shuffles them, and then
passes them on to the second server, which does the same thing,
as shown in the diagram below (I&#39;ve used a different color
for each identifier to help make it easier to follow).&lt;/p&gt;
&lt;div class=&quot;img-wrap&quot;&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ipa-service.png&quot; alt=&quot;IPA service shuffling&quot; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In this example, the advertiser has two encrypted impressions
and two encrypted purchases (it knows which are which because
that information was available when the API was called, so it
can just label them). One of the impressions and one of the purchases
line up but it doesn&#39;t know that. It passes all of its data in a batch to the
first server of the IPA service (A) which partially decrypts
them, blinds them with its secret, and then passes them to
server B. Server B decrypts them the rest of the way and
applies its own blinding key. At this point server B has a list
of blinded identifiers labeled with whether they were
impressions or purchases. Because the blinding keys are
constant, each time identifier $I_1$ is blinded, the blinded
values are the same, and so it can match up the impression and
purchase for $I_1$ (both shown in blue). However, because the values
are blinded, it can&#39;t match them up to the input reports.
Given this information, the server it can then produce a report
to the advertiser to the effect that there was one pair,
one unmatched impression and one unmatched purchase.&lt;/p&gt;
&lt;h2 id=&quot;multi-device&quot;&gt;Multi-Device &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ipa-overview/#multi-device&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the main requirements for the design of IPA is that it
allow for linking activity across multiple devices. For instance,
I might see an ad on my mobile device but make the purchase on
my desktop machine. Obviously, advertisers and publishers want to be able to
measure the impact of their ads.
With the current cookie-based system it&#39;s possible
under some circumstances to associate those events. For instance,
if Facebook is displaying the ad and you&#39;re logged into Facebook,
then your Facebook account ID can be used to link them up.
A number of the proposed private conversion measurement
systems (e.g., Apple&#39;s &lt;a href=&quot;https://privacycg.github.io/private-click-measurement/&quot;&gt;Private Click Measurement&lt;/a&gt;)
do not allow for this use case, which is clearly a big part
of Meta&#39;s motivation for proposing IPA, as a lot of their
usage is on mobile.&lt;/p&gt;
&lt;p&gt;IPA handles this case in a straightforward fashion, via the
per-client identifier. Earlier I just assumed that each client $i$ had
an identifier $I_i$ but didn&#39;t say how it was assigned. If instead,
we arrange that each &lt;em&gt;user&lt;/em&gt; has the same identifier across all of their
devices, then IPA just naturally links up impressions on device
A and device B without any extra work.&lt;/p&gt;
&lt;p&gt;This of course reduces to the problem of how to get a per-user
identifier synchronized across devices. One obvious approach would
be to have the devices synchronize it, much as browsers can
sync history across devices. However, there are a number of
cases where this won&#39;t work, for instance if you use Chrome
on your Android device and Firefox on your desktop,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
or if the impression came from something other than a browser
like an app or a smart TV (I&#39;m no happier than you
are about ads on my smart TV, let alone having their
conversion measured).&lt;/p&gt;
&lt;p&gt;IPA addresses this issue in a clever but counterintuitive fashion:
it allows any &lt;em&gt;domain&lt;/em&gt; (e.g., &lt;code&gt;example.com&lt;/code&gt; or more likely
&lt;code&gt;facebook.com&lt;/code&gt;) to set a per-domain identifier (which IPA
calls a &amp;quot;match key&amp;quot;) that
can be used by any domain. The idea here
is that when you log into some system (e.g., Facebook), it
sets an identifier that is tied to your account and is therefore
the same across all your devices. The identifier
can be used by &lt;em&gt;any&lt;/em&gt; advertiser or merchant (via the &lt;code&gt;getIPAEvent()&lt;/code&gt;
API), no matter which domain they are on, thus preventing
Facebook from being the only people who can do attribution
via the Facebook account.&lt;/p&gt;
&lt;p&gt;Key to making this work is that the identifier is &lt;em&gt;write-only&lt;/em&gt;:
nobody—including the original domain—can access it,
except by using the API, which of course only produces an
unlinkable, encrypted value. This prevents the identifier from
being used directly for tracking, as would otherwise be the
case for a world-readable value. In fact, you can&#39;t even ask
whether the identifier was set, because then it would leak
one bit. Of course, the original domain knows the identifier for
a given user (because it generated it) and it can set a cookie
on the client to remember if it set the identifier, but if the
cookie is deleted, then it doesn&#39;t know either.&lt;/p&gt;
&lt;h3 id=&quot;ipa-technical-details&quot;&gt;IPA Technical Details &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ipa-overview/#ipa-technical-details&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This section provides technical details on how the IPA service works. I&#39;ve attempted to make
them mostly accessible and can be understood based on high school
math&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
, but they can also be &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#limitations&quot;&gt;skipped&lt;/a&gt; if necessary.
If you don&#39;t care about the details—or
you already waded through this in my &lt;a href=&quot;https://educatedguesswork.org/vaccine-tracking&quot;&gt;post&lt;/a&gt; on linking up vaccine doses—you
can skip this section and still be fine.&lt;/p&gt;
&lt;p&gt;Note: in ordinary integer math, given $g^a$ and $g$ it&#39;s easy to compute
$a$ but we&#39;re going to be doing this in an elliptic curve
where that computation is hard. Everything else is pretty
much the same, but just remember that part.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The service is implemented by having a pair of servers, $A$ and $B$.
Each has a
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Diffie%E2%80%93Hellman_key_exchange&amp;amp;oldid=1066364968&quot;&gt;Diffie-Hellman&lt;/a&gt;
key pair, which is to say a secret value $x$ and a public value
computed as $g^x$.  We&#39;ll call $A$&#39;s key pair $(a, g^a)$ and $B$&#39;s
pair $(b, g^b)$. Each server also has a secret blinding key $K_a$ and
$K_b$. These servers are operated by different entities who are
trusted not to collude. However, if either service behaves correctly
then you&#39;re OK. The service then publishes a combined public
key $g^{a+b}$ which can be computed by multiplying the public keys: $g^a * g^b$
(if you remember your high school math!).&lt;/p&gt;
&lt;p&gt;In order to submit an ID $I$, the sender first encrypts it.
It generates a random secret $x$ and
computes: $g^{x(a+b)} = {(g^{a+b})}^x$. Note that we&#39;re using the service
combined public key and the sender&#39;s private value $x$, so the result is a secret
from attackers who don&#39;t know either $x$ or $a+b$. It then multiplies
$I$ by this value and sends the pair
of values (this is just classic &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=ElGamal_encryption&amp;amp;oldid=1058774653&quot;&gt;ElGamal Encryption&lt;/a&gt;, but to the key $g^{a+b}$):&lt;/p&gt;
&lt;p&gt;$$g^x, I * g^{x(a+b)}$$&lt;/p&gt;
&lt;p&gt;Importantly, this second term can be broken up into a part involving
only $a$ and a part involving only $b$. I.e.,&lt;/p&gt;
&lt;p&gt;$$I * g^{x(a+b)} = I * g^{xa} * g^{xb}$$&lt;/p&gt;
&lt;p&gt;Again, this is just high school math. These values then get sent to
$A$ (or $B$, it doesn&#39;t matter), who computes $g^{xa} = {(g^{x})}^a$
(recall it knows $a$). It then divides the second part by $g^{xa}$:&lt;/p&gt;
&lt;p&gt;$$I *g^{xb} = &#92;frac{I * &#92;cancel{g^{xa}} * g^{xb}}{&#92;cancel{g^{xa}}}$$&lt;/p&gt;
&lt;p&gt;This cancels out the $g^{xa}$ term, leaving you with just a term
that involves $b$, and thus the pair:&lt;/p&gt;
&lt;p&gt;$$g^x, I * g^{xb}$$&lt;/p&gt;
&lt;p&gt;$A$ then blinds this value, by exponentiating both values to $K_a$, giving:&lt;/p&gt;
&lt;p&gt;$$(g^x)^{K_a}, (I * g^{xb})^{K_a}$$&lt;/p&gt;
&lt;p&gt;We can flatten this out to give:&lt;/p&gt;
&lt;p&gt;$$g^{x * K_a}, I^{K_a} * g^{(xb)(K_a)}$$&lt;/p&gt;
&lt;p&gt;$A$ batches these values up with other inputs it has received, shuffles them, and sends
them to $B$. $B$ takes the first term and
computes $(g^{x*Ka})^b = g^{x * K_a * b} = g^{(xb)(K_a)}$. It then
divides the second term by this value, to get:&lt;/p&gt;
&lt;p&gt;$$I^{K_a} = &#92;frac{I^{K_a} * &#92;cancel{g^{(xb)(K_a)}}}{&#92;cancel{g^{(xb)(K_a)}}}$$&lt;/p&gt;
&lt;p&gt;Finally, $B$ blinds the value by taking it to the power $K_b$, this
giving us:&lt;/p&gt;
&lt;p&gt;$$I^{(K_a)(K_b)} = (I^{K_a})^{K_b}$$&lt;/p&gt;
&lt;p&gt;That was a lot of math, but the bottom line is that the actual
identifier $I$ (e.g., the &lt;strike&gt;SSN -- Updated 2022-02-16&lt;/strike&gt; account id) has been
converted into a new blinded value, with (hopefully) the following properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Neither $A$ or $B$ ever saw $I$&lt;/li&gt;
&lt;li&gt;$A$ sees the input encrypted version but doesn&#39;t learn the blinded
version.&lt;/li&gt;
&lt;li&gt;$B$ sees the blinded version but doesn&#39;t learn the encrypted
version.&lt;/li&gt;
&lt;li&gt;You need to know $K_a$ and $K_b$ to compute the blinded version
of $I$.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Disclaimer&lt;/em&gt;: The IPA documents were just published recently,
so I don&#39;t think they have seen enough analysis to prove they
are secure. Here I&#39;m just describing how it&#39;s supposed to work.&lt;/p&gt;
&lt;h2 id=&quot;privacy-properties&quot;&gt;Privacy Properties &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ipa-overview/#privacy-properties&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic two privacy properties we are trying to achieve here are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Neither the advertiser nor the merchant is able to associate a specific input
report to a specific output report, &lt;em&gt;even with&lt;/em&gt; the help of one
of the servers (because you need both $K_a$ and $K_b$). This is
true even if they also know the identifiers, which are not
even required to be high entropy (e.g., they can be e-mail
addresses).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Neither the advertiser nor the merchant is able to determine
which users are represented in a given set of reports or
are associated with a given piece of additional data (see &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#additional-data&quot;&gt;below&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As far as I know, no attacks on property (1) are known
(though see the above caveat about insufficient analysis)
but we do know of an attack on property (2) (see
&lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#appendix%3A-linear-relation-attacks&quot;&gt;the appendix&lt;/a&gt;).
The basic situation is that the advertiser can collude
with whoever issued the match keys and with one of the
servers to determine if a given user is incorporated
in a set of reports. However, if both servers are honest,
this attack will not work. This is not the desired privacy
target, which is that you only have to trust that at
least one server is honest, but it&#39;s where things currently stand.&lt;/p&gt;
&lt;p&gt;In any case, the second server learns more than the first server because it
knows which reports match up with which other reports. However, it
still doesn&#39;t know which ones match up to which input reports
because it doesn&#39;t know $K_a$. This is still a somewhat weird
asymmetry, and when we look at additional data in the next
section, we&#39;ll remove it.&lt;/p&gt;
&lt;p&gt;Importantly, the summaries that are provided to the advertiser
can still leak data. For instance, suppose that the advertiser
wants to know if impression A and purchase B are from the
same user: it can send them in together with a bunch of
fake reports which have random non-matching identifiers. If
the report that comes back lists any matches, then it know
that A and B match. This is a generalized problem in any
aggregate reporting system which I covered in some detail
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio/#input-manipulation-attacks&quot;&gt;previously&lt;/a&gt;
and there are a variety of potential defenses, including
trying to ensure that data comes from &amp;quot;valid&amp;quot; clients and
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-randomness/&quot;&gt;adding noise to the output&lt;/a&gt;. The
IPA proposal &lt;a href=&quot;https://docs.google.com/document/d/1KpdSKD8-Rn0bWPTu4UtK54ks0yv2j22pA5SrAD9av4s/edit#heading=h.2cb0mttqfkv2&quot;&gt;contemplates&lt;/a&gt; some kind of noise injection
along with budgeting for the number of queries
but doesn&#39;t really include a complete design.&lt;/p&gt;
&lt;p&gt;Although this system provides a fair degree of privacy if you
trust the servers, there will of course be people who don&#39;t
trust them, or just don&#39;t want to send their data on principle.
One question I&#39;ve seen asked is whether it will be possible
to configure your software not to participate.
However, from a privacy perspective, it&#39;s actually undesirable to have the API call
just fail because then you have sent some information to
the server that might be used to track you (as most people
will not disable the API). A better approach technically is
just to send an unusable report, e.g., the encryption of
a randomly selected ID. This should not be possible to distinguish
from a valid report without the cooperation of both servers
&lt;em&gt;and&lt;/em&gt; knowing what valid identifiers look like.
Obviously, whether there is such a configuration knob depends on the software
you are using.&lt;/p&gt;
&lt;h2 id=&quot;additional-data&quot;&gt;Additional Data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ipa-overview/#additional-data&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So far the system we have described just lets us count matches, but
what if we want to record more than matches, for instance by
measuring the total amount of money spent by customers via a given
ad campaign? This turns out to be a somewhat tricky problem to
solve because we need to make sure that that information doesn&#39;t
turn into a mechanism for tracking reports through the system.&lt;/p&gt;
&lt;p&gt;For instance, in the diagram above, I had the advertiser label
each report as either an impression or a purchase; this is mostly
fine as long as we only have those two labels because if
there are a reasonable number of each you don&#39;t
know much about whether a given output and a given input
match up. However, if we let the advertiser attach arbitrary
labels, this would obviously be a problem because then they
could collude with one of the servers to track a given input
through the process (this is of course the same reason you
have to shuffle). Naively, suppose that the merchant
adds the customer&#39;s email address to the report, then obviously
if that pops out the other end then you have a real problem.&lt;/p&gt;
&lt;p&gt;IPA doesn&#39;t contain a complete proposal for this, but does have some
handwaving. The general idea is that the &lt;em&gt;client&lt;/em&gt;, not the advertiser
or merchant would attach &amp;quot;additional data&amp;quot; (the cute name for this is
a &amp;quot;sidecar&amp;quot;) to their report. This data would be supplied by the
server which would say something like &amp;quot;make a report that says
that this purchase was for 100 dollars&amp;quot;. This additional data would
also be multiply encrypted so
that neither server could individually decrypt it, but that once
it had been shuffled, the second server would get it along with
the blinded identifier. Note that this additional data would not
be blinded because otherwise you wouldn&#39;t be able to add up the
results; it just appears unmodified in the output.&lt;/p&gt;
&lt;p&gt;But wait, you say, if we just let the advertiser provide arbitrary
data, then it can provide a user identifier of its own which
will then show up in the output and we&#39;re back where we started.
The proposed fix is that instead of just reporting the value directly,
the client instead reports it via some secret-sharing mechanism
like &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio&quot;&gt;Prio&lt;/a&gt;. Of course, this means that the
client actually has to submit &lt;em&gt;two&lt;/em&gt; reports, one that is
processed by server A then server B and one that is processed
by server B then server A, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/ipa-additional-data2.png&quot; alt=&quot;IPA with additional data&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As shown here, the client generates two reports, each of which
contains a Prio share for the value provided by the advertiser.
When the advertiser is ready, it sends one report share to Server A and
one report share to Server B. In this case, I&#39;ve shown reports from
two clients, each with one share. As described above, each server partly
decrypts its reports, shuffles, and then passes it to the other
server. The other server completes the decryption, correlates
the matching reports, and aggregates
(e.g., adds up) the additional data.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Finally, Server A sends
its aggregated additional data to Server B which combines
it with its aggregated additional data and sends the result
back to the advertiser (see my &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio&quot;&gt;post&lt;/a&gt; on
Prio for more details on how this part of the process works).&lt;/p&gt;
&lt;p&gt;So far so good, except that I haven&#39;t specified how the additional
data is encrypted. This part turns out to be somewhat tricky
and the IPA authors don&#39;t have a published design for it at
the moment, so this is piece is still a hard hat area.&lt;/p&gt;
&lt;h2 id=&quot;status-of-ipa&quot;&gt;Status of IPA &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ipa-overview/#status-of-ipa&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So what&#39;s the status of IPA? This has been the source of some
confusion, perhaps in part because Google has implemented some
of their &amp;quot;Privacy Sandbox&amp;quot; proposals in Chrome and has
&lt;a href=&quot;https://www.chromium.org/Home/chromium-privacy/privacy-sandbox/floc/&quot;&gt;already done&lt;/a&gt;
or &lt;a href=&quot;https://github.com/WICG/turtledove/blob/main/Proposed_First_FLEDGE_OT_Details.md&quot;&gt;proposed to do&lt;/a&gt; &lt;a href=&quot;https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/explainer.md&quot;&gt;&amp;quot;origin trials&amp;quot;&lt;/a&gt; (a kind of limited access test) for them. At present, however, IPA
is just a proposal. It has been submitted to the
W3C &lt;a href=&quot;https://patcg.github.io/&quot;&gt;Private Advertising Technology Community Group&lt;/a&gt;
for consideration but has yet to be adopted, let alone shipped by anyone.
In other words, it&#39;s a potentially interesting idea but not
something that is finished or ready to standardize.&lt;/p&gt;
&lt;h2 id=&quot;appendix%3A-linear-relation-attacks&quot;&gt;Appendix: Linear Relation Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ipa-overview/#appendix%3A-linear-relation-attacks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The IPA authors describe a few &lt;a href=&quot;https://docs.google.com/document/d/1KpdSKD8-Rn0bWPTu4UtK54ks0yv2j22pA5SrAD9av4s/edit#heading=h.j0w90menb1l6&quot;&gt;known attacks&lt;/a&gt; on the system (though more analysis is needed).
The most interesting one is what they term &amp;quot;linear relation&amp;quot; attacks.
The basic idea behind this kind of attack is to use the blinding process
as an oracle to determine whether a given user was in the report
set.&lt;/p&gt;
&lt;p&gt;Recall that the result of the blinding process for identity $I_i$
is $I_i^{K_a K_b}$. So if you have two identities $I_1$ and $I_2$ their
blinded versions are of course: $I_1^{K_a K_b}$ and $I_1^{K_a K_b}$,&lt;/p&gt;
&lt;p&gt;These have the interesting property that:&lt;/p&gt;
&lt;p&gt;$$(I_1^{K_a K_b})(I_2^{K_a K_b}) = (I_1 I_2)^{K_a K_b}$$&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Updated 2022-02-16: oops, fixed a subscript&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If the advertiser knows a user&#39;s identifier and it has the cooperation
of one of the servers, it can use this fact to determine
whether a given user was in a set of reports.
If the target user
has identifier $I_t$ it creates two fake reports $I_x$ and $I_y$
such that: $I_y = I_tI_x$. When these are blinded, the result is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$I_x^{K_a K_b}$&lt;/li&gt;
&lt;li&gt;$I_y^{K_a K_b} = (I_x I_t)^{K_a K_b} = (I_x^{K_a K_b})(I_t^{K_a K_b})$&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And if a report from the target was included, then the reports will
also included the blinded version of $I_t$, which is $I_t^{K_a K_b}$.&lt;/p&gt;
&lt;p&gt;The colluding server then looks to see whether there are a triplet of
blinded values $(B_1, B_2, B_3)$ such that $B_1 = B_2 * B_3$. If there
are, then they know that $B_1$ corresponds to $I_y$ and that one of
$B_2$ or $B_3$ corresponds to $I_t$.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
As I said above, this is a known attack and the authors
are working on ideas to address it. Note also that this attack depends
on knowing users identifiers, so it can&#39;t be done by any site,
but just by (or with the help of) the one issuing the identifiers.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Usually this is from an ad network of some kind, but I&#39;m
simplifying. &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The actual proposal
proposal uses different names for the impression and the purchase,
but that&#39;s not necessary for this simple example. &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, it&#39;s bad that sync between browsers of different
manufacturers doesn&#39;t work, but that&#39;s a whole different story. &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In particular, the facts that $(g^a)(g^b) = g^{a+b}$ and
$(g^a)^b = g^{ab}$. &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Yes, I know I&#39;m
using exponential notation. It&#39;s easier to follow for
people not used to EC notation. &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;ve omitted the discussion of the Prio proofs for
simplicity. &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note
that another way to execute this is to just create a new identity
that is the product of two existing identities; this lets you
learn if both are in a set of reports. &lt;a href=&quot;https://educatedguesswork.org/posts/ipa-overview/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Ensuring Privacy For Age Verification</title>
		<link href="https://educatedguesswork.org/posts/uk-age-verification/"/>
		<updated>2022-02-11T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/uk-age-verification/</id>
		<content type="html">&lt;p&gt;The BBC &lt;a href=&quot;https://www.bbc.com/news/technology-60293057&quot;&gt;reports&lt;/a&gt;
that the UK has revived it&#39;s &lt;a href=&quot;https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/985033/Draft_Online_Safety_Bill_Bookmarked.pdf&quot;&gt;online safety
bill&lt;/a&gt;, which was shelved back in 2019. There
has been a lot of concern about the policies embodied in this bill
from organizations ranging from &lt;a href=&quot;https://www.internetsociety.org/blog/2022/01/uk-online-safety-bill-set-to-weaken-encryption-and-put-uk-internet-users-at-risk/&quot;&gt;ISOC&lt;/a&gt;
to &lt;a href=&quot;https://bigbrotherwatch.org.uk/2021/05/big-brother-watch-response-to-the-governments-online-safety-bill/&quot;&gt;Big Brother Watch&lt;/a&gt; but I want to
focus on what&#39;s essentially a technical point, which is that it
represents a threat to user privacy that we don&#39;t
really know how to fix.&lt;/p&gt;
&lt;p&gt;The bill appears to require require adult (i.e., pornography) sites to
verify the age of their users. This has been widely interpreted as effectively requiring the use
of some kind of &lt;a href=&quot;https://avpassociation.com/&quot;&gt;age verification system&lt;/a&gt;.
Regardless of the wisdom of age verification requirements in general
(see, for instance, this &lt;a href=&quot;https://www.bbc.com/news/technology-60293057&quot;&gt;BBC article&lt;/a&gt;),
it&#39;s going to
be difficult to build a system which doesn&#39;t run the risk of
creating a database of everyone who goes to a porn site.
Given that what kind of porn people watch or whether they watch porn
at all is generally considered private information this seems
fairly undesirable.&lt;/p&gt;
&lt;h2 id=&quot;age-verification-providers&quot;&gt;Age Verification Providers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#age-verification-providers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic problem here is that determining whether someone is
over 18 requires learning a fair bit of information about
them, generally enough to determine their identity. The UK Age
Verification Providers Association lists a &lt;a href=&quot;https://avpassociation.com/find-an-av-provider/&quot;&gt;variety of different methods for determining age&lt;/a&gt;,
such as government identity documents, mobile phone record, credit reference agency, credit cards, etc.,
most of which are directly tied to your real-world identity.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There are two major ways in which these age verification systems can work, neither of which is great:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The site itself is verifying your age, e.g., by collecting
the above information and using some third party service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The site somehow bounces/redirects/embeds some third
party age verification site.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In both cases, the age verification service learns your
identity and the site that you are going to (because
the site has an account with the service). In the first
case, the site probably &lt;em&gt;also&lt;/em&gt; learns your identity and
so can associate it with the exact pages you view
rather than just the site you visit.&lt;/p&gt;
&lt;p&gt;The general assumption by the UK government seems to be that
this privacy issue will be dealt with by policy controls, i.e.,
by restricting use and mandating security measures.
In April 2019,
the British Board of Film Classification designed an
&lt;a href=&quot;http://web.archive.org/web/20190724192228if_/https://www.ageverificationregulator.com/assets/bbfc-age-verification-certificate-standard-april-2019.pdf&quot;&gt;Age-verification Certificate Standard&lt;/a&gt; for age verification
providers (AVPs) which prescribes a bunch of data retention
policies as well as a set of procedures for attempting to ensure
that the provider&#39;s network is secure (penetration testing,
cryptographic key lifetimes, monitoring requirements, etc.).
This &lt;a href=&quot;https://twitter.com/AlecMuffett/status/1121733258327285760&quot;&gt;Twitter thread&lt;/a&gt;
by well-known security guy
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Alec_Muffett&amp;amp;oldid=1042219358&quot;&gt;Alec Muffett&lt;/a&gt;
does a good job of analyzing this standard and comes to
some pretty negative conclusions. I have a bigger concern,
though, which is the disclosure of your identity in the
first place: even if you trust that the AVP will follow its
own policies, they could still be hacked (see, for instance
this &lt;a href=&quot;https://en.wikipedia.org/wiki/2017_Equifax_data_breach&quot;&gt;2007 Equifax Breach&lt;/a&gt;),
or their records could be subpoenaed. The bottom line is that
you&#39;re placing a lot of trust in someone you have no real
relationship with.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
A better system would be one in which nobody ever got
both your identity and the fact that you were on a given
site.&lt;/p&gt;
&lt;h2 id=&quot;anonymous-age-verification&quot;&gt;Anonymous Age Verification &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#anonymous-age-verification&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The good news is that we now have technical mechanisms that enable
this kind of anonymous verification of people&#39;s ages. The cryptographic
details are complicated (see &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#digression%3A-anonymous-credentials&quot;&gt;here&lt;/a&gt;
for a description of one such system), but the basic idea looks
like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You go to the age verification provider and prove your
age (most likely by proving your identity).&lt;/li&gt;
&lt;li&gt;The AVP issues you an unlinkable, anonymous credential.&lt;/li&gt;
&lt;li&gt;When you go to the porn site you provide the credential
as proof of age.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This way the site knows you are of the appropriate age but doesn&#39;t
learn who you are. And because the credential is unlinkable&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; the
porn site and the AVP can&#39;t collude to discover which users are
which. This is all reasonably well
understood technology cryptographic technology (see, for instance, &lt;a href=&quot;https://ietf-wg-privacypass.github.io/base-drafts/draft-ietf-privacypass-architecture.html&quot;&gt;Privacy
Pass&lt;/a&gt;)
and while it might be a bit challenging to integrate it with the
Web, it&#39;s far from impossible. Unfortunately, I&#39;m not sure how much this helps.&lt;/p&gt;
&lt;p&gt;The problem is that even if the credential which the
AVP provides to the user is anonymous, the &lt;em&gt;AVP&lt;/em&gt; still
sees the user&#39;s identity at the time they prove their
age to the AVP. If the main reason that people need to
do age verification is to watch porn then this is a
pretty strong signal of the user&#39;s behaviors, and so
they still need to trust the AVP&#39;s discretion. Ironically,
this is a case where privacy would be better if people had
to routinely demonstrate their age. For instance, if you
needed to demonstrate you were over 18 ever time you
bought something on Amazon or read the New York Times—or even used Facebook—then it wouldn&#39;t tell the AVP much when you signed up
with it. However, if it&#39;s mostly just to access porn sites,
then users don&#39;t really get to hide behind the less embarrassing
use cases.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Regardless of the wisdom from a policy perspective of this kind
of age verification, it seems like a real privacy threat.
I&#39;m well aware that the privacy situation on the Web is extremely
bad, but that&#39;s something that browser makers are hard at work
preventing, with technologies ranging from &lt;a href=&quot;https://blog.mozilla.org/security/2021/02/23/total-cookie-protection/&quot;&gt;cookie restrictions&lt;/a&gt;
to &lt;a href=&quot;https://support.apple.com/en-us/HT212614&quot;&gt;IP address-hiding proxies&lt;/a&gt;,
and so we&#39;re gradually moving towards a world where you don&#39;t
have to trust either Web sites or the trackers embedded on them.
However, requiring this kind of age verification would effectively require
people to trust that the AVPs protect their privacy. This is exactly
the kind of trust we usually try to avoid via technical controls,
but in this case those don&#39;t seem like they will be effective,
leaving users with nothing but trust.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are some AVPs which offer face-based age estimation.
While this technically doesn&#39;t involve learning your identity,
I&#39;m not sure people should be that much happier about having
the AVP have their photo, and of course given the capabilities
of facial recognition, it will often be possible to determine
your identity anyway. In any case, the most common mechanism for
providers to offer seems to be based on government documents. &lt;a href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is of course true to some extent with the porn site
itself, but they don&#39;t necessarily have your name
and IP addresses aren&#39;t necessarily sufficient to
identify you. Plus, you could use a VPN. &lt;a href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
What unlinkable means in this context is that the credential
that the AVP sees is different from and can&#39;t be connected to the one that is presented
to the porn site. &lt;a href=&quot;https://educatedguesswork.org/posts/uk-age-verification/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>DNS Security, Part VII: Blockchain-based Name Systems and Transparency</title>
		<link href="https://educatedguesswork.org/posts/dns-security-blockchain2/"/>
		<updated>2022-02-07T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dns-security-blockchain2/</id>
		<content type="html">&lt;p&gt;DNS security, I just can&#39;t quit you
(see parts &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane&quot;&gt;III&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox&quot;&gt;IV&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox&quot;&gt;V&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain&quot;&gt;VI&lt;/a&gt;).
In &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain&quot;&gt;Part VI&lt;/a&gt; I talked about blockchain-based
name systems, but I forgot to mention one aspect: defense against surreptitious changes.
For instance, suppose the attacker doesn&#39;t want to take over
&lt;code&gt;example.com&lt;/code&gt; but just wants to intercept TLS connections
to it; for obvious reasons, they don&#39;t want it to be common
knowledge that that&#39;s happening.
One could argue that blockchain-based systems
makes that kind of thing harder than with conventional systems
(DNS + PKI), but I don&#39;t think that&#39;s really true, for reasons
laid out in this post.&lt;/p&gt;
&lt;p&gt;The naive version of a blockchain-based DNS system
mechanically and inflexibly enforces some
specific policy (typically first-come-first-served). This doesn&#39;t
do a good job of accommodating a number of real-world use cases such as (1) people
losing their cryptographic keys or (2) people registering domain
names corresponding to someone else&#39;s trademark. In the DNS,
these are relatively easily handled: if you lose your
DNSSEC key, you can just update it as long as you can
authenticate to your registrar; if you lose your password,
you can probably recover it; if someone registrars your
trademark, there&#39;s the &lt;a href=&quot;https://www.icann.org/resources/pages/help/dndr/udrp-en&quot;&gt;UDRP&lt;/a&gt;.
In blockchain-based systems, however, these mechanisms are
not available, because everything ties mechanically back to your private key.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s of course possible to build a flexible system which incorporates some
element of discretion in these situations. The Ethereum Name Service (ENS)
&lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/dnsop/-9zBqWpvNBlekGotR211s1mf6tM/&quot;&gt;sort of contemplates this&lt;/a&gt;,
though they also don&#39;t seem to have defined any real policies
for how to handle these cases beyond trusting the system operators.
It&#39;s not clear how this is better than the existing system of DNS governance:
I know ICANN isn&#39;t particularly popular, but they &lt;em&gt;do&lt;/em&gt; have fairly clear
policies for how to handle exceptional cases (not that these cases are
actually that exceptional).&lt;/p&gt;
&lt;p&gt;The problem is that as soon as you allow this kind of discretion&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
into the system, it undercuts the basic value proposition of having
the names on the ledger: if that discretion can be exercised for legitimate
reasons it can also be exercised for illegitimate reasons (e.g., to steal
your domain name). The question then becomes whether it&#39;s possible
to detect and contain that kind of misuse.&lt;/p&gt;
&lt;h2 id=&quot;how-to-transfer-domains&quot;&gt;How to Transfer Domains &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#how-to-transfer-domains&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Before we ask about how to handle these exceptional cases, we first
need to look at how you handle the normal case of name transfer.
As I mentioned earlier, registration is done just by storing
a name/public key pair on the ledger, with the rule being that
the first registrant wins. Suppose Alice has registered &lt;code&gt;example.com&lt;/code&gt;
and wants to transfer it to Bob, what now?&lt;/p&gt;
&lt;p&gt;The obvious way to handle this is for Alice to use her key to digitally
sign a record transferring the domain and insert it into the ledger. This can just be the
same record that Bob would have used to register the domain if
he had been first, but signed by Alice. In this case, then,
what it means to own the domain is to have an unbroken chain
of signatures starting from the original registrant.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Note that you need to bake this rule about transfers into the
system early on; otherwise, there is a risk that some relying
parties (i.e., clients) won&#39;t have been updated and so won&#39;t
accept the transfer, which is an obvious interoperability problem.&lt;/p&gt;
&lt;h2 id=&quot;involuntary-transfers&quot;&gt;Involuntary Transfers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#involuntary-transfers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;From a technical perspective, involuntary transfers are just a
natural extension of voluntary transfers. The way this works
is that you have some set of keys which can authorize transfers
for domains they don&#39;t actually own (once again, this has
to be baked into the system from quite early on, at least
at some level). So, if Bob holds the trademark
on &amp;quot;Example&amp;quot;, and Alice registers &lt;code&gt;example.com&lt;/code&gt; then there
might be some (unspecified) procedure that Bob goes through
to demonstrate that he really should own &lt;code&gt;example.com&lt;/code&gt; and
if he prevails, then whoever holds those keys would create
a new record on the ledger reassigning &lt;code&gt;example.com&lt;/code&gt; to
Bob&#39;s public key I&#39;m being vague about the details here
because AFAICT none of the existing systems seem to have
developed any specific procedures along these lines, so we&#39;re
just talking in the abstract.
Note that you can use a similar technique to handle lost
keys; these aren&#39;t technically involuntary but from
a technical perspective, it&#39;s basically the same thing
because your key is your identity and the original key
isn&#39;t being used to make the transfer.&lt;/p&gt;
&lt;p&gt;Obviously, you can make the precise &lt;em&gt;technical&lt;/em&gt; conditions under
which a transfer is valid as complicated as you want. For instance,
you can require multiple keys to sign (or use a threshold
signature scheme), require multiple signatures on different
days, whatever. You can even require the record to contain some
description of what happens. But at the end of the day the story is the
same: there&#39;s some process that takes place outside of the
ledger machinery that leads some group of people to conclude
that a transfer is warranted and then they effectuate the
transfer on the ledger.&lt;/p&gt;
&lt;p&gt;The key point, however, is that the transfer itself has to
be recorded on the ledger in order to take effect. This
makes it difficult to surreptitiously transfer a domain name,
because everything that happens is public.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;dns-and-the-webpki&quot;&gt;DNS and the WebPKI &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#dns-and-the-webpki&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Let&#39;s compare this to the situation with DNS. As we saw earlier,
because it&#39;s a hierarchical system, nothing stops &lt;code&gt;.com&lt;/code&gt; from
lying about who owns &lt;code&gt;example.com&lt;/code&gt;. It can even serve correct
records to some people and bogus records&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
to others (a &amp;quot;split
view&amp;quot;). The same thing is true for the WebPKI: a CA can issue
a certificate for &lt;code&gt;example.com&lt;/code&gt; to the attacker who
can use it to impersonate the real owner of &lt;code&gt;example.com&lt;/code&gt;,
and it&#39;s mostly invisible to relying parties.
On first glance, this looks like a real advantage for these
ledger-based systems, where this misbehavior is inherently visible to
relying parties and to everyone else (whether they know enough to act on it
is another question). However, I don&#39;t think that&#39;s really true,
because it&#39;s possible to add transparency onto these systems.&lt;/p&gt;
&lt;p&gt;Let&#39;s start with the WebPKI piece. It&#39;s certainly true that
surreptitious misissuance is possible and the purpose of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Certificate_Transparency&amp;amp;oldid=1065604666&quot;&gt;Certificate Transparency (CT)&lt;/a&gt;
is to detect just this kind of misissuance. Briefly, CT is
a system of append-only ledgers designed to ensure that
every valid WebPKI cert is visible on the ledger. This
makes it possible to check the ledger for suspicious
certificate issuance. The technical details here are
a little complicated, in part because CT was created after
the WebPKI was already in wide use, but as a general
matter the visibility guarantees are pretty similar to
those that a ledger base name system provides.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Note that one nice feature of this kind of system—unlike
a ledger-based system—is that you can roll it out gradually
because processing the transparency data is not required to
accept the certificate.&lt;/p&gt;
&lt;p&gt;This brings us to the question of the DNS itself. Here too, it&#39;s
possible to think of adding some after the fact transparency mechanism
to prevent parents generating bogus data.
At one point there was some interest in &lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/trans/n097RUV58dVyFYBq2VKxA9Yb1_Y/&quot;&gt;&amp;quot;CT for DNSSEC&amp;quot;&lt;/a&gt;,
but apparently not enough to get it off the ground. I wasn&#39;t
deeply involved in that discussion, but IIRC there
were concerns about log scaling and in particular
about &lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/trans/4MDQmTiUmHY29DT5mvhsYLJbc8U/&quot;&gt;DoS attacks/spamming the logs&lt;/a&gt;.
These are real issues but they primarily arise because
of the notion that the DNS has to be free(-ish). In
the existing ledger systems you just deal with this
by charging people (in some cases &lt;a href=&quot;https://ycharts.com/indicators/ethereum_average_transaction_fee&quot;&gt;quite a bit&lt;/a&gt;)
to store transactions on the log). If you were willing to do that,
the problem seems like it could be simplified considerably.&lt;/p&gt;
&lt;h2 id=&quot;detecting-and-handling-misbehavior&quot;&gt;Detecting and Handling Misbehavior &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#detecting-and-handling-misbehavior&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You may have noticed that I&#39;ve sort of skipped a step here:
all of these mechanisms just record every action, but that
doesn&#39;t tell you what to do about it, or necessarily even
how to detect it. The basic idea here is that one can scan the
ledger/CT log and look for transactions which look fishy.
There are a number of ways this can happen:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;People can scan looking for their own names.&lt;/li&gt;
&lt;li&gt;People can register for some service that scans looking
for names for all of their clients.&lt;/li&gt;
&lt;li&gt;You can just generally scan for suspicious-looking
stuff (e.g., why did Google&#39;s name just get reassigned?)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is probably somewhat easier for the blockchain-based systems
because the exceptional cases are going to be rare and are
clearly marked, so you can just ignore all the others,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
but it&#39;s certainly possible with a system like CT
(CT calls these services &lt;a href=&quot;https://certificate.transparency.dev/monitors/&quot;&gt;&amp;quot;monitors&amp;quot;&lt;/a&gt;),
and there have already been a number of cases where CT has detected
various kinds of misbehavior, including certificates which should
&lt;a href=&quot;https://groups.google.com/g/mozilla.dev.security.policy/c/fyJ3EK2YOP8/m/yvjS5leYCAAJ&quot;&gt;never have been issued.&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;None of this is to say that it&#39;s not useful to have some transparency mechanism
to detect misbehavior, and I agree that it&#39;s a nice property of ledger
based systems that that&#39;s built into the system. My point here, however, is
it&#39;s not really much an inherent advantage over our current systems because
we can add transparency mechanisms to them. We already have
such a mechanism built on top of the WebPKI in the form of Certificate Transparency
and if we really wanted one for DNSSEC, we could almost certainly find
a way to build one. More importantly, we can get these benefits
incrementally: preserving the validity of all of our current
names while adding transparency on top, which seems a lot easier than starting
from scratch with an incompatible system.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is actually a general problem with systems that are
rooted in cryptographic keys, whether they are on the
blockchain or otherwise (e.g., end-to-end encryption).
It&#39;s quite common for people to lose their keys, and
building a system that allows recovery from this that
doesn&#39;t involve trusting someone else not to attack
you is a really hard problem. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Just to anticipate an objection, you obviously can encode
some kind of complicated recovery logic into the system
that might handle some of these cases via a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Smart_contract&amp;amp;oldid=1067908079&quot;&gt;smart contract&lt;/a&gt;
but I&#39;m skeptical that you can handle every case this way;
the world is just too complicated. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
What happens if there are two signatures from the same registrant?
This is obviously impermissible because once Alice has
transferred the domain to Bob she can&#39;t also transfer
it to Charlie. This is called &amp;quot;double spending&amp;quot;, and
is one of the primary reasons that cryptocurrency
systems use ledgers. For our purposes, we can just
ignore the second transfer. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I had originally thought that it would also break
the original owner&#39;s use of the domain, but upon
reflection, I&#39;m less sure. Suppose that Alice owns
&lt;code&gt;example.com&lt;/code&gt; and is DNSSEC signing her domains.
If the domain is transferred to Bob, he can
serve up a record that includes both Alice&#39;s keys
and his own, which means that the records that
Alice signs will be valid but that Bob can also
sign his own records. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
What I mean by &amp;quot;bogus&amp;quot; in this case is that they haven&#39;t
effected a transfer; if you checked &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=WHOIS&amp;amp;oldid=1069674665&quot;&gt;whois&lt;/a&gt;
it would still show the correct owner. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The two major differences are that the ledger in
CT isn&#39;t decentralized and that RPs have
limited ability to verify ledger consistency
(see &lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/trans/Zm4NqyRc7LDsOtV56EchBIT9r4c/&quot;&gt;here&lt;/a&gt;
for more writeup on this). Not to say that I don&#39;t
think these are issues, but I also think it&#39;s
clearly possible to build a CT-style system
that was better in these respects. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though of course there are also cases where someone&#39;s
key is compromised/stolen which just look like
normal transfers. A practical system also needs a way to
deal with these. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain2/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>DNS Security, Part VI: Blockchain-based Name Systems</title>
		<link href="https://educatedguesswork.org/posts/dns-security-blockchain/"/>
		<updated>2022-02-04T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dns-security-blockchain/</id>
		<content type="html">&lt;p&gt;This is Part VI of my series on DNS Security
(parts &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane&quot;&gt;III&lt;/a&gt;),
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox&quot;&gt;IV&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox&quot;&gt;V&lt;/a&gt;).
I thought I was done after talking about recursive to authoritative,
but I then realized I wanted to cover blockchain-based name
systems; these aren&#39;t strictly part of the DNS, but they&#39;re intended
to fulfill a similar function, so it&#39;s worth covering them a bit.&lt;/p&gt;
&lt;p&gt;DNS is a &lt;em&gt;distributed&lt;/em&gt; system: name data is spread across multiple
servers and resolving a given name requires asking those servers.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
Specifically, it is a &lt;em&gt;hierarchical, federated&lt;/em&gt; system. In this case,
federated means that different domains are controlled by different
people and &lt;em&gt;hierarchical&lt;/em&gt; means that domain &lt;code&gt;example.com&lt;/code&gt; is
subordinate to (and hence controlled by) &lt;code&gt;.com&lt;/code&gt;, which is in turn
subordinate to the root. This is easy to see if you work through the
resolution process described in &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;post I&lt;/a&gt;: if the
root decides to lie to you about who owns a given domain, then you
just get the wrong answer. This notion of trust is baked into DNSSEC,
where each zone is signed by its parent: here too, any compromise of
the root or of a parent domain leads to compromise of the child.&lt;/p&gt;
&lt;h2 id=&quot;government-takeover&quot;&gt;Government Takeover &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#government-takeover&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This structure has lead to a fair amount of complaining about the
trustworthiness of the DNS. The conspiracy theory version of this
is that the root is operated by the
&lt;a href=&quot;https://www.iana.org/&quot;&gt;Internet Assigned Numbers Authority (IANA)&lt;/a&gt;,
which is part of the &lt;a href=&quot;https://www.icann.org/&quot;&gt;Internet Corporation for Assigned Names and Numbers (ICANN)&lt;/a&gt;,
which is a US corporation, and so the US government will take over the root
and require it to misbehave (e.g., taking over people&#39;s names, signing
false records, etc.). For instance, suppose that the US government
decided that the Iranian TLD (&lt;code&gt;.ir&lt;/code&gt;) shouldn&#39;t work any more.
To my knowledge that has never happened—and for reasons
covered below, I think it&#39;s kind of unlikely—though it&#39;s of course
possible in principle.&lt;/p&gt;
&lt;p&gt;What &lt;em&gt;has&lt;/em&gt; happened, however, is that various governments have simply
seized people&#39;s domain names. This isn&#39;t done by &lt;a href=&quot;https://www.icann.org/en/blogs/details/icann-doesnt-take-down-websites-3-12-2010-en&quot;&gt;leaning on ICANN&lt;/a&gt;,
however, but rather by serving the registrar or the registry with
&lt;a href=&quot;https://domaingang.com/domain-crime/on-ice-federal-agents-seize-airbagsplace-com-domain/&quot;&gt;legal process&lt;/a&gt;.
&lt;a href=&quot;https://www.ice.gov/&quot;&gt;US Immigration and Customs Enforcement (ICE)&lt;/a&gt;
does this, as does
&lt;a href=&quot;https://domaingang.com/domain-crime/gearsservers-the-fbi-takes-over-control-of-infringing-domains-and-seizes-more-than-5-million/&quot;&gt;the FBI&lt;/a&gt;,
with the typical thing to do to just
be replace the web site with something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://domaingang.com/wp-content/uploads/2017/05/airbags-ice.jpg&quot; alt=&quot;ICE Takedown&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Note that once you&#39;ve taken over the site, you
own the name and can put whatever on it. The typical practice
seems to be to put the kind of warning label I show above,
which is pretty obvious, but you could just as well build
a replica of the site and continue to silently operate it
—you can even get a valid TLS certificate—though
this doesn&#39;t seem to be common.&lt;/p&gt;
&lt;p&gt;A related concern is that many of the popular TLDs are actually
owned by foreign countries who might not have the most friendly
relationship with the jurisdiction that registrants are in.
For example, &lt;code&gt;.ly&lt;/code&gt; (as in the URL shortener &lt;a href=&quot;https://bitly.com/&quot;&gt;&lt;code&gt;https://bitly.com&lt;/code&gt;&lt;/a&gt;)
is actually the Libyan TLD. If you have one of these domain names,
you&#39;re obviously somewhat exposed to action by the parent jurisdiction.&lt;/p&gt;
&lt;p&gt;Of course, it&#39;s somewhat of a semantic question whether this is
actually an attack. Obviously, if you&#39;re the owner of &lt;code&gt;airbags.com&lt;/code&gt;
you might be unhappy about the government seizing your domain name,
but it&#39;s not clear how different it is from just seizing your
servers or your car; the government has plenty
of processes for taking your stuff. The situation is somewhat
different here in that so much of the infrastructure is in
the US, and so people who don&#39;t live in the US are suddenly
exposed to actions by the US government, but the situation isn&#39;t
too dissimilar to what happens if you live outside the
US but decide to store your money
in a US bank and of course there certainly are plenty of TLDs that
are operated by non-US entities.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;As I said, despite fears to the contrary, I&#39;m not aware of any
case when the US has used its control of the root to take over
a name. It&#39;s not even really clear how this would work because
in order to take over &lt;code&gt;example.com&lt;/code&gt; they would first
need to take over all of &lt;code&gt;.com&lt;/code&gt; and serve all the other
records &lt;em&gt;besides&lt;/em&gt; &lt;code&gt;example.com&lt;/code&gt; normally. This seems like
a lot of work and it&#39;s not really something you could do
surreptitiously, as lots of people  would notice that &lt;code&gt;.com&lt;/code&gt;
suddenly had a new DNS key and was being served from a new
set of servers; it&#39;s much easier to just require the
registry to change their records.&lt;/p&gt;
&lt;p&gt;Again, I want to emphasize here that most of this
isn&#39;t about attacking the technical infrastructure of the
DNS. Rather, it&#39;s changing actual ownership relationships
in the name hierarchy, as when the government seizes
your car; the DNS just reflects those ownership relationships.
In other words, this is the system faithfully publishing
the official data as it is designed to do.&lt;/p&gt;
&lt;h2 id=&quot;filtering&quot;&gt;Filtering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#filtering&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even if you don&#39;t control the TLD for a domain name, it&#39;s
comparatively easy to filter the DNS if you control the network. This is not so much
because of the hierarchical structure of the name system
but because of the fact that the name resolution tends to
be controlled by the network. This means that if you control
that resolver you can easily remove any names you don&#39;t
like or (if DNSSEC is not in use) replace them with names
of your own (see &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/posts/dns-security-dnssec/#limited-protection-against-censorship&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;This kind of filtering is fairly common. For instance, China&#39;s
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=List_of_websites_blocked_in_mainland_China&amp;amp;oldid=1069106714&quot;&gt;Internet filtering&lt;/a&gt;
uses &lt;a href=&quot;https://arxiv.org/pdf/2106.02167.pdf&quot;&gt;DNS blocking&lt;/a&gt;. It&#39;s
also common practice in enterprise or school environments to block
domains corresponding to material that the network operator
thinks is contraband (often &amp;quot;adult&amp;quot; material).
One of the impacts of &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox&quot;&gt;encrypted DNS&lt;/a&gt;
is to make this kind of blocking harder, especially if the device
or software is configured to use an unfiltered resolver.&lt;/p&gt;
&lt;h2 id=&quot;name-ownership-disputes&quot;&gt;Name Ownership Disputes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#name-ownership-disputes&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finally, there are circumstances in which a domain can be
involuntarily transferred from one party to another.  One common case
is where someone registers a domain name which corresponds to a
trademark held by another entity. Suppose, for instance, that I
register &lt;code&gt;coca-co.la&lt;/code&gt; (which incidentally, seems to be
unregistered) and started some business selling soda (EKR Cola!). The
Coca Cola Company might be upset about this and their recourse
is ICANN&#39;s &lt;a href=&quot;https://www.icann.org/resources/pages/help/dndr/udrp-en&quot;&gt;Uniform Domain Dispute-Resolution Policy (UDRP)&lt;/a&gt;
which allows them to file a complaint and potentially gain control
of the name. The details are of course complicated, but here
are some high points:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;b. Evidence of Registration and Use in Bad Faith. For the purposes of
Paragraph 4(a)(iii), the following circumstances, in particular but
without limitation, if found by the Panel to be present, shall be
evidence of the registration and use of a domain name in bad faith:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(i) circumstances indicating that you have registered or you have acquired the domain name primarily for the purpose of selling, renting, or otherwise transferring the domain name registration to the complainant who is the owner of the trademark or service mark or to a competitor of that complainant, for valuable consideration in excess of your documented out-of-pocket costs directly related to the domain name; or&lt;/p&gt;
&lt;p&gt;(ii) you have registered the domain name in order to prevent the owner of the trademark or service mark from reflecting the mark in a corresponding domain name, provided that you have engaged in a pattern of such conduct; or&lt;/p&gt;
&lt;p&gt;(iii) you have registered the domain name primarily for the purpose of disrupting the business of a competitor; or&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;(iv) by using the domain name, you have intentionally attempted to attract, for commercial gain, Internet users to your web site or other on-line location, by creating a likelihood of confusion with the complainant&#39;s mark as to the source, sponsorship, affiliation, or endorsement of your web site or location or of a product or service on your web site or location.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;p&gt;Name registration is frequently first come first served, and
it&#39;s actually reasonably likely that you&#39;d be able to register some
domain name or another that was arguably infringing, as it&#39;s kind
of a subjective judgment, but the UDRP allows the holder of the
trademark to try to reclaim the name in these cases after the fact.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Note that here too, we&#39;re not talking about a &lt;em&gt;technical&lt;/em&gt; process
but rather a legal/policy one. The UDRP allows the trademark
holder to argue that a certain domain name shouldn&#39;t
have been registered and if they prevail, then the domain
registration will be transferred or canceled. When that happens,
the DNS gets changed to reflect the outcome of that process, but
that&#39;s just publishing a decision which got made outside the DNS.&lt;/p&gt;
&lt;h2 id=&quot;blockchain%2Fledger-based-systems&quot;&gt;Blockchain/Ledger-Based Systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#blockchain%2Fledger-based-systems&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This brings us to the topic of alternative name systems based
on ledgers, which are advertised as addressing these issues,
especially censorship.
Probably the two best known of these are the &lt;a href=&quot;https://docs.ens.domains/&quot;&gt;Ethereum Name Service&lt;/a&gt;
and &lt;a href=&quot;https://www.namecoin.org/&quot;&gt;Namecoin&lt;/a&gt;. Here&#39;s Namecoin&#39;s description
of its value proposition:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Protect free-speech rights online by making the web more resistant to censorship.&lt;/li&gt;
&lt;li&gt;Attach identity information such as GPG and OTR keys and email, Bitcoin, and Bitmessage addresses to an identity of your choice.&lt;/li&gt;
&lt;li&gt;Human-meaningful Tor .onion domains.&lt;/li&gt;
&lt;li&gt;Decentralized TLS (HTTPS) certificate validation, backed by blockchain consensus.&lt;/li&gt;
&lt;li&gt;Access websites using the .bit top-level domain&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;What all this means will become clear below.&lt;/p&gt;
&lt;h3 id=&quot;how-to-build-a-blockchain-based-name-system&quot;&gt;How to build a blockchain-based name system &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#how-to-build-a-blockchain-based-name-system&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As with everything crypto, the details are fantastically complicated, but
the idea is conceptually simple:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
The blockchain provides a &lt;em&gt;decentralized append-only ledger&lt;/em&gt;.
I&#39;ll probably describe how this works at some future point, but for now, this means it&#39;s a data structure which:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Has a fixed order of operations&lt;/li&gt;
&lt;li&gt;(Mostly) anybody can write to it.&lt;/li&gt;
&lt;li&gt;You can only write to the end of it&lt;/li&gt;
&lt;li&gt;Everyone agrees on the contents&lt;/li&gt;
&lt;li&gt;Nobody can change anything that happened in the past&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With a data structure like this, it&#39;s easy to build a simple
&lt;em&gt;first-come-first-served (FCFS)&lt;/em&gt; name system. You just write
a record to the ledger consisting of (1) the name you want to register (2) your public key.
E.g.,&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;domain-name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;example.com&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;public-key&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;br /&gt;     &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token property&quot;&gt;&quot;kty&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;EC&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;crv&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;P-256&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;x&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;...&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;y&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;...&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;use&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;enc&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token property&quot;&gt;&quot;kid&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Public key borrowed from &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7517#appendix-A.1&quot;&gt;RFC7517&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As long as you&#39;re the first person to register a name, congratulations, you own it!&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Anyone can validate you own it just by looking through the entire ledger
from the beginning (this may take some time) and seeing that you were the
first person to register it. If someone tries to register it afterwards,
then it&#39;s just ignored (whether it even makes it into the ledger or not is
a detail, though an important one in practice).
From this point on, things are pretty simple: once you&#39;ve registered
your public key you can just use it to sign ordinary DNSSEC records for your name
and use DNSSEC for every name below you. Of course you also need some way
to tell resolvers which authoritative server to go to to get those records, but this can be
stuffed in the blockchain as well, or stuffed somewhere else and signed
with your blockchain-based key.&lt;/p&gt;
&lt;p&gt;You&#39;ll notice that above I&#39;ve tried to register a domain in &lt;code&gt;.com&lt;/code&gt;
but actually this is bad news: if we have two mechanisms for registering names
that are uncoordinated we&#39;re going to run into situations where some people
see &lt;code&gt;example.com&lt;/code&gt; as one thing and other people as another (&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc2826&quot;&gt;RFC 2826&lt;/a&gt;
does a good job of laying this out). In practice, people who want
to build their own naming systems tend to try to locate them in
as-yet-unused portions of the DNS space: for instance, Namecoin uses
&lt;code&gt;.bit&lt;/code&gt; and ENS uses &lt;code&gt;.ens&lt;/code&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;The idea here is that if you
have a Namecoin-capable client you look at the top label and if it&#39;s
&lt;code&gt;.bit&lt;/code&gt; you use Namecoin and otherwise you use the DNS.
Of course, these names are still
notionally within the DNS and so there&#39;s actually nothing stopping
ICANN from deciding tomorrow to mint a &lt;code&gt;.bit&lt;/code&gt; domain,
which would cause confusion.
The general idea seems to be that once you get enough usage of your
new TLD, ICANN will avoid creating it because it would cause too
much trouble; it remains to be seen whether this is actually true.&lt;/p&gt;
&lt;p&gt;It is technically possible to register a &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6761&quot;&gt;Special Use Domain Name (SUDN)&lt;/a&gt;
that is outside of the DNS hierarchy, so one might imagine
doing so for a new blockchain-based name system.
The bar for this is quite
high and the only top-level SUDN which has been registered for
an alternative namespace is &lt;code&gt;.onion&lt;/code&gt; (&lt;a href=&quot;https://www.iana.org/go/rfc7686&quot;&gt;RFC 7686&lt;/a&gt;)
for Tor&#39;s cryptographically-generated domain names. This registration
was controversial at the time and in some sense sui generis
because the names are cryptographically verified rather than looked up;
for obvious reasons the IETF and ICANN are less excited about registering TLDs
name resolution protocols which are conceptually similar to DNS but
use different technical underpinnings.&lt;/p&gt;
&lt;h3 id=&quot;technical-properties&quot;&gt;Technical Properties &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#technical-properties&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;With this under our belts, let&#39;s look at the technical properties of the
system. For the purposes of this discussion, I&#39;ll be assuming that the ledger
behaves as advertised; there are potential attacks on the ledgers but
they&#39;re not so interesting here.&lt;/p&gt;
&lt;p&gt;The main advertised advantage for blockchain-based systems is
censorship resistance.  The first thing that Namecoin lists as it&#39;s
value proposition is &amp;quot;Protect free-speech rights online by making the
web more resistant to censorship.&amp;quot;  Similarly, ENS advertises itself
as &amp;quot;Launch censorship-resistant decentralized websites with ENS.&amp;quot;.
The answer to the question of whether these systems are more censorship
resistant is &amp;quot;sort of&amp;quot;.
As we saw before, there are two primary ways to censor a domain
name in the DNS (1) legally/administratively take over the domain
itself (2) block the domain name resolution process. We need to look
at these independently.&lt;/p&gt;
&lt;h4 id=&quot;domain-takeover&quot;&gt;Domain Takeover &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#domain-takeover&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;How resistant this kind of system is to domain takeover depends on the
name allocation and reassignment policy. The simple
first-come-first-served system I described above really is more
resistant to takeover by governments or by anybody else. The ledger
enforces ordering and so there&#39;s just no external mechanism to transfer a
name from someone to someone else. The system of course needs a
mechanism to do transfers, but that&#39;s done by having the original
owner sign the a transfer and that means you need the owner&#39;s private
key, which the government or ICANN wouldn&#39;t have.&lt;/p&gt;
&lt;p&gt;It&#39;s far from clear that these are actually good properties to have,
for two reasons. First, if you lose your signing key you have effectively
lost your domain, which seems like a terrifying prospect if you&#39;re the
person in charge of &lt;code&gt;cisco.bit&lt;/code&gt;. You certainly don&#39;t want to be like
&lt;a href=&quot;https://www.nytimes.com/2021/01/12/technology/bitcoin-passwords-wallets-fortunes.html&quot;&gt;that guy&lt;/a&gt;
who had 220 million dollars locked up in a Bitcoin wallet that you&#39;ve
lost the password for.
Second, while it may seem like a good property that nobody can take
your correctly registered domain away from you, it also means that
if someone registers a domain for a trademark you own then you
can&#39;t take it away from them, which is obviously less desirable.
Given the importance of the UDRP for the existing domain name system,
I have a hard time seeing most big company wanting to participate
in that kind of a system, given the risk that they will be unable
to protect their trademarks.&lt;/p&gt;
&lt;p&gt;It&#39;s of course possible to build a system that allows for controlled
involuntary transfers: you just have some group of people who can
sign those transfers. It appears that this is what ENS &lt;a href=&quot;https://mailarchive.ietf.org/arch/msg/dnsop/-9zBqWpvNBlekGotR211s1mf6tM/&quot;&gt;has done&lt;/a&gt;,
requiring four out of seven trusted people to change policies (see &lt;a href=&quot;https://yanmaani.github.io/no-ethereum-name-service-is-still-a-clown-show/&quot;&gt;here&lt;/a&gt;)
for a much more negative assessment of the ENS system), but then
the censorship resistance benefits come down to how much you
trust those people and especially how much you trust them not to be
pressured by governments.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
The material that ENS has published here isn&#39;t very encouraging:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The root node is presently owned by a multisig contract, with keys
held by trustworthy individuals in the Ethereum community. We expect
that this will be hands-off, with the root ownership only used to
effect administrative changes, such as the introduction of a new
TLD, or to recover from an emergency such as a critical
vulnerability in a TLD registrar.&lt;/p&gt;
&lt;p&gt;The keyholders are drawn from respected members of the community, and
with the exception of Nick Johnson, founder of ENS, are unaffiliated
with ENS. We ask and expect them to exercise their individual
judgement acting in the interests of the ENS community, rather than
rubber-stamping requests made to them by ENS developers&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This kind of ad hoc decision based on people being expected
to act in the best interests of the community doesn&#39;t really
seem sufficient to govern a name system which supports
trillions of dollars of transactions.&lt;/p&gt;
&lt;p&gt;Finally, it&#39;s worth noting that none of this means
that your domain can&#39;t be taken away by legal process because that
could potentially be used to force you to sign the transfer.
In this case the system will duly publish that transfer as there&#39;s
no real way for it to tell you signed it under duress).
All the cryptographic machinery is really doing is making it
hard for people who can&#39;t force you to do things to effectuate
the transfer.&lt;/p&gt;
&lt;h4 id=&quot;filtering-2&quot;&gt;Filtering &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#filtering-2&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;It&#39;s a bit hard to tell whether this kind of system is more resistant to
filtering than ordinary DNS. At the moment, the answer is almost
certainly &amp;quot;yes&amp;quot; because there is an established ecosystem devoted
to filtering DNS and the blockchain-based name systems are too small
to be worth filtering.&lt;/p&gt;
&lt;p&gt;I don&#39;t think, however, that there is any real technical reason why
these systems are more resistant to filtering. At the end of the day,
the way these systems work is that you download a bunch of data
from the ledger and then verify all the signatures. So what makes
them filtering resistant is that the distribution mechanism for
the blockchain data is peer to peer and also that you can layer them on top of
some other system that is censorship resistant (e.g., download them
from the Web or via a real anti-censorship system like &lt;a href=&quot;https://www.torproject.org/&quot;&gt;Tor&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;However, you can do precisely the same thing with DNS. First, if things
are DNSSEC signed then they can just be passed around directly because DNSSEC chains
are self-contained. And even for non-DNSSEC-signed domains,
it&#39;s certainly possible to have some third party (e.g., Google public DNS)
sign the data. So, as long as you have a censorship-resistant
publishing mechanism—this is the hard part—DNS will be equally filtering resistant.
Moreover, given that secure DNS transport mechanisms
are already in common use, it
seems like it&#39;s going to be a lot easier to make the DNS hard to filter
than to deploy some entirely new naming system, especially given
that much of the Internet will be running on DNS for years whatever
new system is invented.&lt;/p&gt;
&lt;h4 id=&quot;what-about-the-rest%3F&quot;&gt;What about the rest? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#what-about-the-rest%3F&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Let&#39;s just look quickly at the rest of the Namecoin value proposition.
(I&#39;m not trying to beat up on Namecoin here; mostly similar comments
would apply to ENS or any of these systems.)&lt;/p&gt;
&lt;h5 id=&quot;attach-identity-information-such-as-gpg-and-otr-keys-and-email%2C-bitcoin%2C-and-bitmessage-addresses-to-an-identity-of-your-choice&quot;&gt;Attach identity information such as GPG and OTR keys and email, Bitcoin, and Bitmessage addresses to an identity of your choice &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#attach-identity-information-such-as-gpg-and-otr-keys-and-email%2C-bitcoin%2C-and-bitmessage-addresses-to-an-identity-of-your-choice&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;This seems like a reasonable goal, but there&#39;s nothing
special about a blockchain system  that lets you do this. DNS
already supports new record types and we&#39;ve already &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane&quot;&gt;seen&lt;/a&gt;
how to attach cryptographic material to DNS; it&#39;s straightforward to add
all of these record types as well. All you&#39;d need is to want to do it.&lt;/p&gt;
&lt;h5 id=&quot;human-meaningful-tor-.onion-domains.&quot;&gt;Human-meaningful Tor .onion domains. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#human-meaningful-tor-.onion-domains.&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;This is kind of confusing until you read the
&lt;a href=&quot;https://www.namecoin.org/docs/faq/#how-does-namecoin-compare-to-tor-onion-services&quot;&gt;FAQ&lt;/a&gt;.
The situation is that &lt;code&gt;.onion&lt;/code&gt; addresses are special because the
address is actually the hash of a cryptographic key. With Namecoin
you can register a pointer from a regular name to a &lt;code&gt;.onion&lt;/code&gt;
name. This is fine, but of course you can do it with DNS
as well as long as the domain is DNSSEC signed.&lt;/p&gt;
&lt;h5 id=&quot;decentralized-tls-(https)-certificate-validation%2C-backed-by-blockchain-consensus.&quot;&gt;Decentralized TLS (HTTPS) certificate validation, backed by blockchain consensus. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#decentralized-tls-(https)-certificate-validation%2C-backed-by-blockchain-consensus.&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;There are two points here: first that you can have a TLSA record associated with
your Namecoin domain. This is of course equally possible with ordinary DNS
as well. The second point is just the one I made above, which is that the
name registration is rooted in the blockchain not in the DNS hierarchy.&lt;/p&gt;
&lt;h5 id=&quot;access-websites-using-the-.bit-top-level-domain&quot;&gt;Access websites using the .bit top-level domain &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#access-websites-using-the-.bit-top-level-domain&quot;&gt;#&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;And this just means that you can use &lt;code&gt;.bit&lt;/code&gt; instead of &lt;code&gt;.com&lt;/code&gt; or whatever.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the end of the day, I don&#39;t really see much advantage to
these blockchain/ledger-based systems. The primary value proposition
is that they are censorship resistant. However, this property
is provided by having them rigidly and mechanically enforce some policy, which seems more like a
bug than a feature. Our existing name system &lt;em&gt;depends&lt;/em&gt; on flexibility
in order to function, both to save people from themselves (if they
lose their key) and to save them from others (if people register
&lt;em&gt;your&lt;/em&gt; name in the DNS) and so a system that doesn&#39;t provide any
discretion seems like a step backwards. It&#39;s of course possible to
layer some kind of governance structure over top of such a system—this
would of course have to be cryptographically reified—but
that&#39;s not what we have now and at that point, it seems like
you&#39;ve reproduced the same discretionary properties of
the DNS that motivate these systems.&lt;/p&gt;
&lt;p&gt;Even if these systems do turn out to be technically superior,
they face the same network effect challenges that we saw with
TLSA: anyone can get a DNS name today and it will be acceptable
to basically anyone else on the Internet. By contrast, if
you register something in &lt;code&gt;.bit&lt;/code&gt; then very few people will
be able to see it, so you&#39;re most likely going to want to register
&lt;em&gt;both&lt;/em&gt; a DNS name and a &lt;code&gt;.bit&lt;/code&gt; name, at which point the
incentive to register the Namecoin name as well seems rather low.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Or,
in the words of &lt;a href=&quot;https://amturing.acm.org/award_winners/lamport_1205376.cfm&quot;&gt;Leslie
Lamport&lt;/a&gt;,
&amp;quot;A distributed system is one in which the failure of a computer you
didn&#39;t even know existed can render your own computer unusable.&amp;quot; &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Though many of the
country code TLDs are operated by US companies. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, one problem with minting new top level domains
is that they are a new opportunity for people to register
names corresponding to some large entity. Rather than go
through the dispute resolution process, it&#39;s potentially
easier and cheaper for the owners of famous names to
just register in every new TLD. In a 2014 &lt;a href=&quot;https://cseweb.ucsd.edu/~voelker/pubs/xxxtld-www14.pdf&quot;&gt;paper&lt;/a&gt;,
Halvorson et al. show that the vast majority of
registrations in &lt;code&gt;.xxx&lt;/code&gt; (intended for adult content)
were either for defensive (registering your own name)
or speculative (hoping to sell the name) purposes, thus
reflecting a windfall to the operators of &lt;code&gt;.xxx&lt;/code&gt; of
around $10 million USD. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Full disclosure: I once participated in the design of a similar system,
called &lt;a href=&quot;https://crypto.stanford.edu/portia/pubs/articles/M995439383.html&quot;&gt;Churro&lt;/a&gt;
in the days before blockchain. It seemed like a good idea at the time. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;At least in theory. The
degree to which this is true in practice is debatable, but for now let&#39;s
take it as true. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As a practical matter, this isn&#39;t &lt;em&gt;quite&lt;/em&gt; what you want to do: things
don&#39;t get added to the ledger instantaneously and so it&#39;s possible
for someone to &amp;quot;frontrun&amp;quot; your domain by seeing the domain you
registered and trying to register it themselves, in the hope that
they will get added to the ledger first; this is easier if they are
themselves part of the infrastructure of the ledger. The
fix for this is to first record a &lt;em&gt;commitment&lt;/em&gt; to the name
you want to register (e.g., &lt;em&gt;HMAC(K, &amp;lt;name&amp;gt;)&lt;/em&gt; with a randomly
chosen key &lt;em&gt;K&lt;/em&gt;) and then once that commitment has been logged,
you &lt;em&gt;reveal&lt;/em&gt; the commitment by publishing &lt;em&gt;K&lt;/em&gt;. This prevents
someone from seeing the domain you want to register before
it is already in the ledger. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;ENS will also allow you
to register names in the ordinary DNS space but they require you
to already own the DNS name, so that&#39;s not a problem. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, it&#39;s quite possible to build a ledger-type system on
top of DNS, using something like certificate transparency. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-blockchain/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Privately Measuring Vaccine Doses</title>
		<link href="https://educatedguesswork.org/posts/vaccine-tracking/"/>
		<updated>2022-01-25T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-tracking/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
};
&lt;/script&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: this post contains a bunch of LaTeX math notation rendered
in MathJax, but it doesn&#39;t show up right in the newsletter
version.*&lt;/p&gt;
&lt;p&gt;Anyone can go to the CDC Web site and find out the &lt;a href=&quot;https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total&quot;&gt;status of the US
COVID vaccination
effort&lt;/a&gt;. Unfortunately,
due to privacy controls in the CDC&#39;s data collection(see &lt;a href=&quot;https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total&quot;&gt;footnotes&lt;/a&gt;), this data seems
to be less accurate than we would like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To protect the privacy of vaccine recipients, CDC receives data
without any personally identifiable information (de-identified data)
about vaccine doses. Each record of a dose has a unique person
identifier. Each jurisdiction or provider uses a unique person
identifier to link records within their own systems. However, CDC
cannot use the unique person identifier to identify individual
people by name. If a person received doses in more than one
jurisdiction or at different providers within the same jurisdiction,
they could receive different unique person identifiers for different
doses. CDC may not be able to link multiple unique person
identifiers for different jurisdictions or providers to a single
person.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These inaccuracies are made somewhat less apparent  by the fact that the CDC caps (&amp;quot;top codes&amp;quot;)
estimates of vaccine coverage at 95% (formerly 99%), so you
don&#39;t see reports where more people in an area are vaccinated
than actually live in that area:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;CDC has capped the percent of population coverage metrics at
95%. This cap helps address potential overestimates of vaccination
coverage due to first, second, and booster doses that were not
linked. Other reasons for overestimates include census denominator
data not including part-time residents or potential data reporting
errors.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As I understand it, the situation here is that the data reported
by states is roughly accurate, as long as you don&#39;t get into people
who got doses out of state, but the CDC data is less so because
of these privacy measures. For instance, the CDC&#39;s data shows that
40 different states have 95% of people 65+ with at least one
dose, which not only doesn&#39;t help you distinguish between California and Iowa
but actually seems to be wrong for California as well. Here&#39;s
a comparison of the California and Federal Data for 65+.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Metric&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;California&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Federal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Number w/ &amp;gt;= 1dose&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5926681&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6606265&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Percent w/ &amp;gt;= 1dose&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;90.8&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;95 (presumably topcoded)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Number fully vaxxed&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5403586&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;5147954&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Percent fully vaxxed&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;82.8&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;88.2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It&#39;s somewhat hard to square this data, and the percentages may
just be about the size of the eligible population, but the
raw numbers should at least agree. At least part of what&#39;s going on seems
to be that doses are being misattributed (e.g., boosters marked
down as first doses) and CDC not having ground truth doesn&#39;t help us
debug. A number of commenters have been quite critical of these privacy
measures and their impact on the accuracy of the data. For instance,
here&#39;s political blogger &lt;a href=&quot;https://www.slowboring.com/p/the-cdcs-vaccine-data-is-all-wrong&quot;&gt;Matt Yglesias&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Besides this, the stated reason
for collecting such bad data is not to allow people to get illicit
boosters, it’s to protect their privacy. As I wrote in “They
deliberately put errors in the Census,” I am very skeptical that the
privacy value of having the government do inaccurate record-keeping
is high.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I suspect I&#39;m more sensitive to privacy issues than Yglesias, but I&#39;m
also not sure this is the right tradeoff. In this case, especially,
that the states
(and of course, probably the health insurance companies)
seem to have non-anonymous measurements of who got vaccinated and
when, so it&#39;s not clear why it&#39;s that big a privacy increment
to deny this data to the CDC. Moreover, the states can&#39;t easily get more private because
they seem to be using that information to implement their
vaccine passport systems. For instance in &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca/&quot;&gt;California&lt;/a&gt; or
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/&quot;&gt;New York&lt;/a&gt;,
you can just input some identifying information and download your
vaccine passport; this obviously wouldn&#39;t work if this data is
stored without identifiers. With that said, I can also see the argument
that you don&#39;t want the federal government having this information
and—unlike the states—it&#39;s using it for statistical
and not operational purposes, so it&#39;s worth asking whether it&#39;s
possible to improve the situation. As usual, sounds like a job for
cryptography.&lt;/p&gt;
&lt;h2 id=&quot;anonymously-measuring-vaccination-rates&quot;&gt;Anonymously Measuring Vaccination Rates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#anonymously-measuring-vaccination-rates&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The underlying problem here is that we want to be able to measure
the rate of various kinds of vaccination in each demographic
region. This seems to require that we be able to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Associate vaccine doses with demographic information like where they
were given, where the patient lives, age of the patient, etc. This allows you
to measure geographic deployment rates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Associate multiple doses given to the same person so that you don&#39;t
say obviously wrong things like 200% of people in California have
gotten first doses and nobody has gotten a second dose.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first requirement is actually readily addressable with privacy
preserving measurement techniques like
&lt;a href=&quot;https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/corrigan-gibbs&quot;&gt;Prio&lt;/a&gt;
(see &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio&quot;&gt;here&lt;/a&gt; for my writeup), but it doesn&#39;t
do a good job of linking up multiple doses. One could imagine
having a different counter for &amp;quot;first dose&amp;quot;, &amp;quot;second dose&amp;quot;, etc.
with the states reporting each dose appropriately.
However, part of the problem seems to be that the status of
each dose is being inaccurately
reported, both because of errors and because some people actually
deliberately concealed or at least didn&#39;t disclose their vaccination status, e.g., to
get an early booster.&lt;/p&gt;
&lt;p&gt;If you didn&#39;t care about privacy, you would address this just by
having each dose associated with some permanent identifier
(ID) like
personal name or—even better for accuracy but worse for
privacy—social security number. You then would just have
a list of doses, dates, and identifier and could sort things
out in the obvious fashion by grouping by identifier and then
counting. But of course the problem with this is that the
identifier is, well, &lt;em&gt;identifying&lt;/em&gt;, which is what we are
trying to avoid. So, what you want is a stable pseudonymous identifier
(PID) derived from this information (thus allowing grouping) but that
can&#39;t be reversed to give the input information (thus protecting
user privacy).&lt;/p&gt;
&lt;h2 id=&quot;some-things-which-won&#39;t-work%3A-hashes%2C-prfs%2C-and-oprfs&quot;&gt;Some things which won&#39;t work: hashes, PRFs, and OPRFs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#some-things-which-won&#39;t-work%3A-hashes%2C-prfs%2C-and-oprfs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The obvious thing to do here is to just hash the data, but that&#39;s
clearly not going to work: the cryptographic guarantees around
hash functions only apply when the input space is large, but in
this case, the input space will be quite small (for instance, there are only
10&lt;sup&gt;9&lt;/sup&gt; SSNs, so it&#39;s trivial to compute the hashes for
all of them, and the space of names is not that much larger).
This means that the CDC could easily make a table
of all the possible identifiers and who they belong to.&lt;/p&gt;
&lt;p&gt;The next natural thing to try is some kind of keyed one-way function
like a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Pseudorandom_function_family&amp;amp;oldid=1029021822&quot;&gt;Pseudorandom Function (PRF)&lt;/a&gt;, but the problem then becomes who
can compute this function. PRFs depend on a key, and if
the CDC knows the key then it&#39;s not better than a hash
function. But as a practical matter, if every state
has the key, then it&#39;s a stretch to think that the CDC
will not get it or convince some state employee to run
the PRF for them on the (again, small) set of potential names.&lt;/p&gt;
&lt;p&gt;Recently, it&#39;s become common to throw &lt;em&gt;oblivious PRFs (OPRFs)&lt;/em&gt; to this kind
of problem. An oblivious PRF is like a PRF except that it can be computed
on a &lt;em&gt;blinded&lt;/em&gt; version of the input. This means that you can set up
a server which will compute the OPRF for people without seeing it, like
so:&lt;/p&gt;
&lt;img width=&quot;400,&quot; alt=&quot;OPRF Usage&quot; src=&quot;https://educatedguesswork.org/img/vaccine-oprf.png&quot; /&gt;
&lt;p&gt;In this version, the state would blind the patient&#39;s name and
send it to the OPRF Server, which would compute the OPRF on
the blinded input and then return it. The state then unblinds
the result to get the PRF on the original input. This has
two important properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It can&#39;t be computed without the key.&lt;/li&gt;
&lt;li&gt;The OPRF service never sees both the input and the output
because they are blinded. The state of course does get the output (the PID)
and sees the input.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the full system, then, health authorities, states, etc. would
collect the patient&#39;s ID and ask the OPRF server to
map it to the pseudonymous PID, and then send the
information to the CDC. This is slightly better, but not much
because the OPRF service is an oracle that lets a lot of people
map the client true identifier to the PID, and so
you need to tightly control access to that service. But a lot
of entities (at least states, but also maybe local health departments)
are going to have access to that service, which makes the
problem hard, as any of them can be used to unmask people.
Moreover, it&#39;s kind of an inconvenient interface
for the states because they want to just submit their data,
not have some complicated mapping process that they do
pre-submission.&lt;/p&gt;
&lt;h2 id=&quot;interoperable-private-attribution&quot;&gt;Interoperable Private Attribution &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#interoperable-private-attribution&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The underlying problem here is that we need a way to map $ID &#92;rightarrow PID$
that can&#39;t just be used by the CDC. Otherwise, they can just try
candidate $ID$ values until they get a $PID$ match.
Recently, Erik Taubeneck (Meta), Ben Savage (Meta), and Martin Thomson (Mozilla)
published a new multiparty computation technology called
&lt;a href=&quot;https://docs.google.com/document/d/1KpdSKD8-Rn0bWPTu4UtK54ks0yv2j22pA5SrAD9av4s/edit&quot;&gt;Interoperable Private Attribution (IPA)&lt;/a&gt;. As the name suggests, it&#39;s designed for measuring
conversions in online advertisements, and I may write about that
later, but the basic ideas can be
adapted for measuring vaccine uptake.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The general idea behind IPA is that we have a service
which is kind of like an OPRF in that it
takes in an encrypted identifier and outputs a &lt;em&gt;blinded&lt;/em&gt;
identifier which can&#39;t be tied back to the original input (which is
essentially the same problem we are trying to solve). However,
if we just emit a blinded identifier in response to an encrypted
identifier, the service can be used as an oracle to compute
the mapping to blinded IDs. In order to prevent that,
the service actually has to
take in a group of encrypted identifiers and shuffle them somehow
(e.g., by emitting them in a batch). This gives you an interface
more like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/vaccine-tracking.png&quot; alt=&quot;Blinding service&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Note that this interface could take identifies in as a batch
and then shuffle the batch or one at a time but then buffer
them; it just has to make it hard to determine which input
goes with which output.
In addition, we don&#39;t want one entity (in this case the OPRF
server) to be able to unmask everyone, so we need to distribute
the computation over multiple servers, like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/vaccine-tracking2.png&quot; alt=&quot;Multiple server blinding service&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Note that the precise communication between servers is a bit
complicated. The first server actually only partially
decrypts and then blinds and passes things to the second
server. Details can be found &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#ipa-technical-details&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are a number of ways to use a service like this. The most
obvious is simply to have each vaccine dose be a single report,
and then submit them to the service and look at the output in
batches. The result will just be a set of delinked, shuffled identifiers,
like so:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/vaccine-ipa-table.png&quot; alt=&quot;Blinding identifiers&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Now you can just count how many times each identifier appears;
identifiers which appear once are single doses, twice are double
doses, thrice are boosted etc. If you take the data in daily
batches, you can also estimate the amount of time between doses
by looking at what day each identifier is reported. You can
also do geographic distributions by sending each jurisdiction
in separately. In the original IPA proposal, the way things
work is that all the encrypted reports were sent to a
&amp;quot;Consumer&amp;quot; which gets meta-information like the site the
report came from. The consumer could then ask the service
to aggregate only a subset of the data.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There are two important properties of this system that might not be
immediately obvious. First, the blinding and shuffling process doesn&#39;t
preserve meta-information: it just emits the identifiers.
If you want to learn about subsets of the data, you need to process
the data in chunks (e.g., one state at a time.)
The IPA authors have been working on how to carry
some meta-information along with the reports, but it&#39;s a somewhat
complicated problem, as the blinding process would destroy it,
and they haven&#39;t published a design for this feature.&lt;/p&gt;
&lt;p&gt;Second, if you allow the consumer to do a lot of queries of
different subsets, then they can use that to extract
information about the original data
(see &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio/#input-manipulation-attacks&quot;&gt;here&lt;/a&gt; for
more). This requires you to restrict the number
of different queries, or potentially to just commit
in advance to what you will do (e.g., just down to counties on
a daily basis). Sybil attacks in which the consumer
injects fake queries are also possible, but can
be prevented by having the jurisdiction sign their
reports.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;ipa-technical-details&quot;&gt;IPA Technical Details &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#ipa-technical-details&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This section provides technical details. I&#39;ve attempted to make
them mostly accessible and can be understood based on high school
math&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
, but they can also be &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#limitations&quot;&gt;skipped&lt;/a&gt; if necessary.
This section will not render properly in the newsletter
because I use MathJax to render LaTeX. Click &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking#technical-details&quot;&gt;here&lt;/a&gt; to see
it rendered on the site.&lt;/p&gt;
&lt;p&gt;Note: in ordinary integer math, given $g^a$ and $g$ it&#39;s easy to compute
$a$ but we&#39;re going to be doing this in an elliptic curve
where that computation is hard. Everything else is pretty
much the same, but just remember that part.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The service is implemented by having a pair of servers, $A$ and $B$.
Each has a
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Diffie%E2%80%93Hellman_key_exchange&amp;amp;oldid=1066364968&quot;&gt;Diffie-Hellman&lt;/a&gt;
key pair, which is to say a secret value $x$ and a public value
computed as $g^x$.  We&#39;ll call $A$&#39;s key pair $(a, g^a)$ and $B$&#39;s
pair $(b, g^b)$. Each server also has a secret blinding key $K_a$ and
$K_b$. These servers are operated by different entities who are
trusted not to collude. However, if either service behaves correctly
then you&#39;re OK. The service then publishes a combined public
key $g^{a+b}$ which can be computed by multiplying the public keys: $g^a * g^b$
(if you remember your high school math!).&lt;/p&gt;
&lt;p&gt;In order to submit an ID $I$, the sender first encrypts it.
It generates a random secret $x$ and
computes: $g^{x(a+b)} = {(g^{a+b})}^x$. Note that we&#39;re using the service
combined public key and the sender&#39;s private value $x$, so the result is a secret
from attackers who don&#39;t know either $x$ or $a+b$. It then multiplies
$I$ by this value and sends the pair
of values (this is just classic &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=ElGamal_encryption&amp;amp;oldid=1058774653&quot;&gt;ElGamal Encryption&lt;/a&gt;, but to the key $g^{a+b}$):&lt;/p&gt;
&lt;p&gt;$$g^x, I * g^{x(a+b)}$$&lt;/p&gt;
&lt;p&gt;Importantly, this second term can be broken up into a part involving
only $a$ and a part involving only $b$. I.e.,&lt;/p&gt;
&lt;p&gt;$$I * g^{x(a+b)} = I * g^{xa} * g^{xb}$$&lt;/p&gt;
&lt;p&gt;Again, this is just high school math. These values then get sent to
$A$ (or $B$, it doesn&#39;t matter), who computes $g^{xa} = {(g^{x})}^a$
(recall it knows $a$). It then divides the second part by $g^{xa}$:&lt;/p&gt;
&lt;p&gt;$$I *g^{xb} = &#92;frac{I * &#92;cancel{g^{xa}} * g^{xb}}{&#92;cancel{g^{xa}}}$$&lt;/p&gt;
&lt;p&gt;This cancels out the $g^{xa}$ term, leaving you with just a term
that involves $b$, and thus the pair:&lt;/p&gt;
&lt;p&gt;$$g^x, I * g^{xb}$$&lt;/p&gt;
&lt;p&gt;$A$ then blinds this value, by exponentiating both values to $K_a$, giving:&lt;/p&gt;
&lt;p&gt;$$(g^x)^{K_a}, (I * g^{xb})^{K_a}$$&lt;/p&gt;
&lt;p&gt;We can flatten this out to give:&lt;/p&gt;
&lt;p&gt;$$g^{x * K_a}, I^{K_a} * g^{(xb)(K_a)}$$&lt;/p&gt;
&lt;p&gt;$A$ batches these values up with other inputs it has received, shuffles them, and sends
them to $B$. $B$ takes the first term and
computes $(g^{x*Ka})^b = g^{x * K_a * b} = g^{(xb)(K_a)}$. It then
divides the second term by this value, to get:&lt;/p&gt;
&lt;p&gt;$$I^{K_a} = &#92;frac{I^{K_a} * &#92;cancel{g^{(xb)(K_a)}}}{&#92;cancel{g^{(xb)(K_a)}}}$$&lt;/p&gt;
&lt;p&gt;Finally, $B$ blinds the value by taking it to the power $K_b$, this
giving us:&lt;/p&gt;
&lt;p&gt;$$I^{(K_a)(K_b)} = (I^{K_a})^{K_b}$$&lt;/p&gt;
&lt;p&gt;That was a lot of math, but the bottom line is that the actual
identifier $I$ (e.g., the SSN) has been
converted into a new blinded value, with (hopefully) the following properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Neither $A$ or $B$ ever saw $I$&lt;/li&gt;
&lt;li&gt;$A$ sees the input encrypted version but doesn&#39;t learn the blinded
version.&lt;/li&gt;
&lt;li&gt;$B$ sees the blinded version but doesn&#39;t learn the encrypted
version.&lt;/li&gt;
&lt;li&gt;You need to know $K_a$ and $K_b$ to compute the blinded version
of $I$.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Disclaimer&lt;/em&gt;: The IPA documents were just published recently,
so I don&#39;t think they have seen enough analysis to prove they
are secure. Here I&#39;m just describing how it&#39;s supposed to work.&lt;/p&gt;
&lt;h2 id=&quot;limitations&quot;&gt;Limitations &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#limitations&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Like any privacy preserving measurement system, this has some limitations,
in particular in the area of flexibility. For instance, this will
only properly attribute vaccine doses when there is an exact match
on the original identifier. This will work OK if the identifier itself
has a single form, like a social security number, but what if you
use name and birthday. In that case, &amp;quot;John Smith&amp;quot; and &amp;quot;John H. Smith&amp;quot;
will look like different people. If you had people&#39;s actual names,
you could try to correct this kind of error by looking for close
matches at approximately the right time, but IPA isn&#39;t &amp;quot;distance preserving&amp;quot;
in that two similar inputs A and B are not likely to have blinded
versions which are similar, so you can&#39;t make this kind of correction
later.&lt;/p&gt;
&lt;p&gt;Another problem is that in the form I&#39;ve presented it, you&#39;re losing
information like the kind of vaccine, so you can&#39;t easily ask &amp;quot;how many
people started with J&amp;amp;J and then boosted with Moderna.&amp;quot; There are
some potential avenues for making this work, for instance to carry
metadata along with the identifier, and that&#39;s probably possible,
but making that work is more complicated than the protocol I described
above.&lt;/p&gt;
&lt;p&gt;Finally, because repeated queries can be used to determine which
reports belong to which individuals, you need to limit the number of different kinds of queries
you do. This is probably fine if you want to just record the number
of doses of each type in a given region, but less fine if you want
to do some kind of deeper research. Of course the states can do
that analysis now because they have accurate data, but if you want
to do national scale analysis or you want it done consistently, that&#39;s
not that great an option.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Given the fact that the states are collecting directly
identifying data about vaccination, I suspect it&#39;s a bad tradeoff to
conceal this data to the CDC: the privacy improvement seems modest and
the effect on accuracy is real. However, if we are going to take it
as a hard requirement that the CDC not learn identifying information,
then we can use Privacy Preserving Measurement techniques to
get substantially better accuracy than the CDC seems to be achieving
today.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Full disclosure: I was an
early reviewer of this design and made some comments and suggestions. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In IPA, the service actually computes aggregates
like sum or whatever, but that&#39;s probably not necessary
here. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
IPA expects to use randomization to provide differential
privacy, but of course this reduces accuracy. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In particular, the facts that $(g^a)(g^b) = g^{a+b}$ and
$(g^a)^b = g^{ab}$. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Yes, I know I&#39;m
using exponential notation. It&#39;s easier to follow for
people not used to EC notation. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-tracking/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>DNS Security, Part V: Transport security for Recursive to Authoritative DNS</title>
		<link href="https://educatedguesswork.org/posts/dns-security-adox/"/>
		<updated>2022-01-21T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dns-security-adox/</id>
		<content type="html">&lt;p&gt;This is Part V of my series on DNS Security
(parts &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane&quot;&gt;III&lt;/a&gt;),
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-fox&quot;&gt;IV&lt;/a&gt;).
In part IV I covered DNS transport security between the
client (the stub resolver) and the recursive resolver but
ran out of room to talk about the recursive to authoritative link,
which is the subject of this post.&lt;/p&gt;
&lt;p&gt;Recall yet again the DNS resolution process, shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/dns-recursive-authoritative.png&quot; alt=&quot;DNS resolution process&quot; /&gt;&lt;/p&gt;
&lt;p&gt;For this post, we will be focusing on protecting the transactions between
the recursive resolver and the authoritative servers, shown
in blue in this diagram. The work on this has been happening
in the IETF &lt;a href=&quot;https://datatracker.ietf.org/wg/dprive/about/&quot;&gt;DNS PRIVate Exchange (dprive) Working Group&lt;/a&gt;.
This is commonly called &lt;em&gt;Authoritative DNS over TLS&lt;/em&gt; (ADoT),
or ADoX if you want to indicate that you don&#39;t care whether
the transport is DoT, DoH, or DoQ.&lt;/p&gt;
&lt;h2 id=&quot;the-basic-setting&quot;&gt;The Basic Setting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#the-basic-setting&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Before we start looking at mechanisms, it&#39;s helpful
to frame the problem correctly. We have two objectives:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Protect the confidentiality of the request. I.e., we
do not want the attacker to know that the user is
trying to resolve &lt;code&gt;example.org&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Protect the integrity of the response. I.e., we
do not want the attacker to be able to lie about
the address for &lt;code&gt;example.org&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As discussed before, while DNSSEC can provide integrity, it cannot provide
confidentiality.&lt;/p&gt;
&lt;p&gt;The first thing to notice is that this means we need to encrypt &lt;em&gt;both&lt;/em&gt;
the link to the authoritative for &lt;code&gt;.org&lt;/code&gt; &lt;em&gt;and&lt;/em&gt; the link to the
authoritative for &lt;code&gt;example.org&lt;/code&gt; because both transactions leak
that the user is interested in &lt;code&gt;example.org&lt;/code&gt;. Importantly, the privacy
value of the query is limited by the number of other domains which are
served by the same authoritative as &lt;code&gt;example.org&lt;/code&gt;, because the
user must be asking for one of those domains. For this reason, if we
have encrypted DNS your users will get better privacy if your domain
is hosted by a DNS provider that serves a lot of other domains as
well. Note that there are cases in which &lt;code&gt;example.org&lt;/code&gt; might
have a lot of subdomains and you wouldn&#39;t want the attacker knowing
which one is being requested, but in the most common case it&#39;s
the second level domain that matters.&lt;/p&gt;
&lt;p&gt;Second, in order to provide confidentiality for these lookups,
we need to provide integrity for the identity of the server.
For instance, if the attacker is able to attack the connection
between the client and &lt;code&gt;b2.org.afilias-nst.org&lt;/code&gt;, it
can substitute its own server for the true authoritative
server &lt;code&gt;b.iana-servers.net&lt;/code&gt;. DNSSEC as-is does not
prevent this form of attack because it doesn&#39;t sign the
NS records at the parent, but only at the child; but by
the time you&#39;ve queried the child for them, it&#39;s too late
because you&#39;ve already leaked the query to the attacker.
This means that the most convenient thing is if every link
uses secure transport, so that you can trust the results
it gives you at stage N before using them for stage N+1.
In other words, you want to have secure transport all the
way to the root.&lt;/p&gt;
&lt;p&gt;As before, then, the basic problem is setting the DNS client&#39;s
(in this case the recursive resolver, confusing, right?)
expectations correctly. In particular, if we are going
to be resistant to active attack, the recursive needs
to know:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;That the authoritative server will do DoX (and what protocol)&lt;/li&gt;
&lt;li&gt;The identity to expect the authoritative server to present&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If it doesn&#39;t know either of these things, then an active
attacker can interfere with the connection. Specifically, if
the recursive doesn&#39;t know that the authoritative server will
use DoX, then the attacker can just simulate an error when the
recursive tries. If it doesn&#39;t know the identity that the
authoritative server will present, then the attacker can just
provide its own identity and impersonate the authoritative.
Unfortunately, this turns out to be quite a bit more difficult
than one would like.&lt;/p&gt;
&lt;h2 id=&quot;root-servers&quot;&gt;Root Servers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#root-servers&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As shown in the diagram above, the first request from the recursive
resolver goes—at least notionally—to the root server.
If this is to use secure transport, the only way that can work
is for the recursive to be preconfigured with the information
about which root servers use secure transport.
There are only 13 root server names (&lt;code&gt;a.root-servers.org&lt;/code&gt;
through &lt;code&gt;m.root-servers.org&lt;/code&gt;), so it&#39;s not at all impractical
to imagine just disseminating an updated list.
Note that it&#39;s
not necessary for all the root servers to switch to secure
transport at once (they are operated by different people),
but of course if the recursive preferentially uses secure
transport, then the first one to switch might get increased load.
As a practical matter, it seems &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#operator-concerns&quot;&gt;unlikely&lt;/a&gt; that we&#39;re going to
get secure transport to the root immediately. It&#39;s much simpler
for the recursive resolver to run a mirror of the root
zone locally, as specified in &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8806.html&quot;&gt;RFC 8806&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;non-root-authoritatives&quot;&gt;Non-Root Authoritatives &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#non-root-authoritatives&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The situation with non-root resolvers (e.g., for &lt;code&gt;.com&lt;/code&gt; or
&lt;code&gt;example.com&lt;/code&gt;) is more complicated, because the way you learn
about those resolvers is &lt;em&gt;from&lt;/em&gt; the root resolver, so how does the
recursive learn that they accept secure transport. There is a similar
problem all the way down the chain: when the parent nameserver (e.g.,
&lt;code&gt;b2.org.afilias-nst.org&lt;/code&gt;) tells you about the child resolver for a
given zone (e.g., &lt;code&gt;b.iana-servers.net&lt;/code&gt; for the zone
&lt;code&gt;example.org&lt;/code&gt;) how do you know the properties of the child
resolver?  If you are used to the Web, there will seem to be an
obvious answer: the parent nameserver should tell you. This is how
things work on the Web, where there is a different URL scheme for
secure transactions (&lt;code&gt;https:&lt;/code&gt;) versus insecure transactions
(&lt;code&gt;http:&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;However, DNS isn&#39;t the Web and there are actually &lt;em&gt;two&lt;/em&gt; &amp;quot;parent servers&amp;quot;
where this data could go. Consider the case where we are trying to resolve
&lt;code&gt;example.org&lt;/code&gt;, but the authoritative server for &lt;code&gt;example.org&lt;/code&gt;
is on &lt;code&gt;example.net&lt;/code&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; In order to look up &lt;code&gt;example.org&lt;/code&gt;
the recursive resolver need to &lt;em&gt;first&lt;/em&gt; look up &lt;code&gt;example.net&lt;/code&gt; so that it
can then contact it.
This means that there are two places where one could indicate that
the connection to &lt;code&gt;example.net&lt;/code&gt; should use secure transport.
First, you could put the information in the NS records for &lt;code&gt;example.org&lt;/code&gt; that say to contact
&lt;code&gt;example.net&lt;/code&gt; (this corresponds to the way things work on
the Web). These records would be served off of the &lt;code&gt;.org&lt;/code&gt;
authoritative server, like so:&lt;/p&gt;
&lt;img width=&quot;700&quot; alt=&quot;Indicator to use DoT at the target&#39;s parent&quot; src=&quot;https://educatedguesswork.org/img/dns-server-target-parent.png&quot; /&gt;
&lt;p&gt;This seems natural but has the disadvantage that
every domain which
uses &lt;code&gt;example.net&lt;/code&gt; as its nameserver needs to update its
own records individually&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
A more DNS-like approach.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
is to have the indication be in a record
that gets served for the authoritative (&lt;code&gt;example.net&lt;/code&gt;) that you get when you look
up its IP address. This would be served off of the &lt;code&gt;.net&lt;/code&gt; authoritative,
like so:&lt;/p&gt;
&lt;img width=&quot;700&quot; alt=&quot;Indicator to use DoT at the resolver&#39;s parent&quot; src=&quot;https://educatedguesswork.org/img/dns-server-resolver-parent.png&quot; /&gt;
&lt;p&gt;The advantage of this second approach is that as soon as &lt;code&gt;example.net&lt;/code&gt;
upgrades to secure transport, everyone who uses it as a nameserver
gets it, by contrast with the first approach where each domain
has to configure it separately for its authoritative server.&lt;/p&gt;
&lt;p&gt;You&#39;ll notice that I&#39;ve just written &amp;quot;Use DoT&amp;quot; here, but
that&#39;s handwaving, not telling you how it actually works,
and in this case details really matter.
Unfortunately, here is where we run into trouble.
The basic problem here is updating the parent server to
know that the server for the child domain supports secure
transport. This is a lot more complicated than it sounds,
to the point where it&#39;s more or less stalled the whole
effort. The next section describes the situation in some
detail, but the TL;DR is that there seem to be no good existing
mechanisms for doing this, so we&#39;re left with either not doing
it or with some hacks (skip &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#no-signaling-in-the-parent&quot;&gt;ahead&lt;/a&gt;).&lt;/p&gt;
&lt;h3 id=&quot;populating-the-parent-zone-(technical)&quot;&gt;Populating the Parent Zone (Technical) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#populating-the-parent-zone-(technical)&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Warning: this section is fairly technical. You can safely skip it if you don&#39;t
care about the details.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Recall that DNS has a number of different &lt;em&gt;resource record&lt;/em&gt; (RR)
types, including &lt;code&gt;A/AAAA&lt;/code&gt; for IPv4 and IPv6 addresses,
etc. The information about what server to use for a given
domain is contained in a nameserver (&lt;code&gt;NS&lt;/code&gt;) record,
but unfortunately that record has no place to carry
other information about the server. The &amp;quot;right&amp;quot; place
to put this information is in the
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-svcb-httpssvc&quot;&gt;service binding (&lt;code&gt;SVCB&lt;/code&gt;)&lt;/a&gt;
record, which can already be used to signal that you should
use HTTPS rather than HTTP (the use case for this is
cases where someone has used an &lt;code&gt;http:&lt;/code&gt; URL but
the target domain always wants you to use TLS).
Unfortunately, actually populating the parent zone
with SVCB turns out to be impractical, at least in
the short to medium term.&lt;/p&gt;
&lt;p&gt;There are several separate entities who have to cooperate in order
to serve a domain name:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The &lt;em&gt;registrant&lt;/em&gt; who actually operates the domain
(e.g., Google for &lt;code&gt;google.com&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;em&gt;authoritative name server&lt;/em&gt; who actually serves
the DNS records for the domain.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;em&gt;registry&lt;/em&gt; which actually hosts the DNS for the
parent domain. For instance Verisign operates &lt;code&gt;.com&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;em&gt;registrar&lt;/em&gt; which is responsible for actually
interacting with the registrant. It is the registrar&#39;s
job to populate the registry&#39;s database with NS
records that point to the authoritative name server.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The registration process proceed as shown below.
Note that I&#39;ve shown it in one order but the steps can sometimes happen in
a different order:&lt;/p&gt;
&lt;img width=&quot;500&quot; alt=&quot;DNS registration&quot; src=&quot;https://educatedguesswork.org/img/dns-registration.png&quot; /&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;First, the registrant registers (i.e., buys)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
the domain with the registrar. This just creates a database record
that indicates they own the domain.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The registrant publishes the DNS records for the domain with the authoritative server.
In this example, they just publish the IP address.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The registrant tells the registrar which authoritative server it is using.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The registrar tells the registry which authoritative server the domain is
using, using the &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc5730&quot;&gt;Extensible Provisioning Protocol&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At the end of the day, we end up with a situation in which:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The registry (and hence the parent domain) is publishing a record that
says that &lt;code&gt;example.com&lt;/code&gt; is hosted on the authoritative server.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The authoritative server publishes a record that actually has the
address for &lt;code&gt;example.com&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In practice, it&#39;s reasonably common for two of these entities to be
the same. For instance, big companies like Google or Facebook usually
run their own authoritative servers. Another version is that many
registrars operate their own authoritative servers. In some cases, a
hosting provider will operate a registrar &lt;em&gt;and&lt;/em&gt; an authoritative
server (for instance, Dreamhost is the registrar, authoritative
server, and web hoster for &lt;code&gt;rtfm.com&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Whatever the exact configuration, the first problem is that
EPP, while extensible, does not currently provide any mechanism
for conveying SVCB records, so if we wanted the registrar to
convey them to the registry, we would need an extension, which
would take some time to deploy. For this reason, there
has been a fair amount of interest in &lt;strike&gt;hijacking&lt;/strike&gt;reusing existing
DNS records which are &lt;em&gt;already&lt;/em&gt; propagated to the parent
zone.&lt;/p&gt;
&lt;h4 id=&quot;ds-glue&quot;&gt;DS Glue &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#ds-glue&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Probably the most promising version of this is called
&lt;a href=&quot;https://www.ietf.org/archive/id/draft-schwartz-ds-glue-02.html&quot;&gt;&amp;quot;DS Glue&amp;quot;&lt;/a&gt;
and uses a DS record for a fake algorithm to smuggle
information about the target resolver. This is one of those
hacks which sits right at the border between hideous and brilliant:
because DS is already propagated the parent, we hopefully
don&#39;t need to change registries or EPP (I say &amp;quot;hopefully&amp;quot;
because this depends on those elements being willing to
handle the new DS record type, and it&#39;s to be seen
whether that will work properly.) DS Glue has the nice
property that it doesn&#39;t require DNSSEC deployment:
as long as there is secure transport&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
to the parent
authoritative (in this case, for &lt;code&gt;.org&lt;/code&gt;)
and to parent for the authoritative server&#39;s domain
(in this case &lt;code&gt;.net&lt;/code&gt;) then the records are trustworthy.
If either of these connections is insecure, however,
then the attacker can substitute new NS records (to point
to a different authoritative server) or strip the DS glue
records (thus blocking encryption.)&lt;/p&gt;
&lt;p&gt;If the transport connection to the parent for the
authoritative isn&#39;t secure, but that zone is DNSSEC
signed, then DS glue still works. It works less well
if there isn&#39;t secure transport for the parent of
the target domain because NS records aren&#39;t signed
in the parent and so the recursive will get the DS glue records
for the wrong authoritative.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h4 id=&quot;tlsa&quot;&gt;TLSA &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#tlsa&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The other major live proposal is to use the TLSA record
to indicate that the authoritative server wants secure
transport. This would be delivered in roughly the same
way as the DS glue record. This has the disadvantage
that it requires that the authoritative server&#39;s domain
be DNSSEC signed, which then becomes an obstacle to deployment.
One of the advantages of secure transport is that it can
be deployed in parallel with DNSSEC and this would remove
that advantage, so I&#39;m less optimistic about this approach.&lt;/p&gt;
&lt;h3 id=&quot;no-signaling-in-the-parent&quot;&gt;No signaling in the parent &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#no-signaling-in-the-parent&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The alternative approach is to not signal in the parent that the
authoritative server for the child zone supports secure
transport. In this case, the recursive will have to discover that
somehow. The most likely way is that you query for a SVCB
record for the authoritative server, though I&#39;ve also
seen suggestions to query for a TLSA/DANE record. This would
look like this:&lt;/p&gt;
&lt;img width=&quot;500&quot; alt=&quot;Looking up resolver status via SVCB&quot; src=&quot;https://educatedguesswork.org/img/dns-resolver-svcb.png&quot; /&gt;
&lt;p&gt;This is secure &lt;em&gt;if
and only if&lt;/em&gt; the zone for the authoritative server
is signed. If it&#39;s not signed there&#39;s nothing stopping an active attacker from just intercepting the
connection to the authoritative server and responding that
the authoritative doesn&#39;t support secure transport (note that
it most likely can&#39;t actually establish secure transport because it will
have the wrong credentials), like so:&lt;/p&gt;
&lt;img width=&quot;500&quot; alt=&quot;Downgrade attack on resolver status via SVCB&quot; src=&quot;https://educatedguesswork.org/img/dns-resolver-svcb-downgrade.png&quot; /&gt;
&lt;p&gt;An additional problem is that it with this design is that
it likely introduces additional latency because the recursive resolver
needs to first query the authoritative server for its
capabilities and only then can it ask the real question
(this is one of the main reasons for signaling in the parent).&lt;/p&gt;
&lt;p&gt;Another alternative is to signal this information in the child
domain itself somewhere. This is technically possible, but the problem
is that by the time you&#39;ve looked up the information in the
client&#39;s domain, you&#39;ve already leaked to the attacker what
domain you want to resolve. Of course, after that&#39;s happened
you could learn that the child wanted secure transport and use
it in the future, but not if the attacker attacks the connection
between you and the child, so you need DNSSEC here too.
Moreover, it means that every child needs to independently
signal that it wants secure transport to its authoritative.&lt;/p&gt;
&lt;h2 id=&quot;insecurely-discovering-secure-transport&quot;&gt;Insecurely Discovering Secure Transport &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#insecurely-discovering-secure-transport&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;While it may ultimately be possible to provide for a method of
securely signaling the use of secure transport, it&#39;s starting to look like
it&#39;s going to be very difficult to converge on something that
everyone likes. In the meantime, a number of people have proposed
that instead we do what&#39;s often called either &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-ietf-dprive-unauth-to-authoritative/&quot;&gt;unauthenticated&lt;/a&gt;
or &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-dkgjsal-dprive-unilateral-probing-01&quot;&gt;probing&lt;/a&gt;
modes of secure transport. The basic idea here is that the recursive
resolver would attempt secure transport to the authoritative
resolver and then in future remember whether that worked or
not.&lt;/p&gt;
&lt;p&gt;Obviously, this kind of system isn&#39;t entirely secure against active
attack, but it might be a good idea anyway for at least three
reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Active attack is harder than passive attack, so you&#39;ve increased
the attacker&#39;s costs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you have a way for the authoritative server to signal its
commitment to supporting secure transport for some period
(like &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6797&quot;&gt;HSTS&lt;/a&gt; for
HTTP), then you can bootstrap insecure discovery into a
secure mode; this requires the attacker to mount an active
attack the &lt;em&gt;first&lt;/em&gt; time you connect, which is even harder.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It helps the authoritative (and to some extent the recursive)
resolvers get experience with deploying secure transport
without running the risk of hard failures if something
goes wrong (see more on this below).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Moreover, this kind of mechanism is &lt;em&gt;much&lt;/em&gt; easier to deploy,
because it doesn&#39;t involve any of the difficulties we saw above
with signaling availability of secure transport prior to
connection establishment, or with propagating records to
other servers. For that reason, it seems like it might be
easier to deploy.&lt;/p&gt;
&lt;p&gt;Historically I&#39;ve not been that enthusiastic about this kind
of insecure discovery (what&#39;s often called &amp;quot;opportunistic&amp;quot;,
but that word has become the subject of headed debates about
its precise definition), because it&#39;s really better to have
secure discovery and this seemed like a distraction from that. However, as the discussion
about how to actually do the secure signaling has dragged
on—and to some extent ground to a halt—I&#39;ve started
to think it&#39;s may be better to do something than nothing.&lt;/p&gt;
&lt;h2 id=&quot;tlsa-vs.-webpki&quot;&gt;TLSA vs. WebPKI &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#tlsa-vs.-webpki&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Another point of contention here is how the authoritative
servers should authenticate. There are two major options here,
use the WebPKI like TLS on the Web, or use TLSA/DANE
(see &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane&quot;&gt;here&lt;/a&gt; for my writeup on this.)
This is an issue which raises some very strong feelings
on both sides.&lt;/p&gt;
&lt;p&gt;On the WebPKI side, the argument is roughly that we already have plenty of
experience with the WebPKI and while it has its problems, it&#39;s well
understood and we know we can deploy it. By contrast, TLSA/DANE
requires taking an unnecessary dependency on DNSSEC.
On the TLSA side, the argument is roughly that (1) the WebPKI
is bad (2) WebPKI security depends on DNS, so we shouldn&#39;t make
DNS security depend on the WebPKI, and (3) we should stop acting
like DNSSEC isn&#39;t a requirement (and perhaps that if we make things
depend on DNSSEC, it will become a requirement).&lt;/p&gt;
&lt;p&gt;As should be clear from this long series of posts, I&#39;m more
optimistic about WebPKI, but I&#39;m more than happy to &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-rescorla-dprive-adox-latest-00&quot;&gt;design a system&lt;/a&gt; which
allows either WebPKI or TLSA/DANE and let the market sort it
out.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
As far as I can tell, this is the position of most of the people
who favor WebPKI, so the two sides really are more like
&amp;quot;WebPKI or TLSA&amp;quot; or &amp;quot;TLSA only&amp;quot; (see above about the
implications of making DNSSEC a requirement.)&lt;/p&gt;
&lt;h2 id=&quot;operator-concerns&quot;&gt;Operator Concerns &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#operator-concerns&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even assuming that we address the technical issues about when
recursive resolvers initiate secure transport, actually getting
deployment requires that the authoritative servers enable ADoX;
unfortunately, there are serious questions about their willingness
to do so. In March of 2021, the root server operators published
a &lt;a href=&quot;https://root-servers.org/media/news/Statement_on_DNS_Encryption.pdf&quot;&gt;statement&lt;/a&gt;
expressing concern about the use of encryption to the
root servers:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Server Operators have some concerns about supporting DNS encryption
for serving the root zone. It is well known that UDP has desirable
performance characteristics, due to its stateless nature. Increasing
the state-holding burden with the addition of connection-oriented
protocols, as well as encryption data, not only reduces the
performance of name servers, but also may raise new types of
denial-of-service attacks.&lt;/p&gt;
&lt;p&gt;At this time, the exact risk-reward tradeoffs for deployment of
encryption to root name servers is unclear and will likely depend on
which particular transport proposals gain momentum. Root Server
Operators do not feel comfortable being the early adopters of
authoritative DNS encryption and would like to first see increased
deployment in other parts of the DNS hierarchy. Meanwhile, there are
other ways to improve privacy in queries sent to root and other name
servers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As described above, it&#39;s of course theoretically possible to just do
secure transport to the TLD server and not to the root (though
Verisign, for instance, runs both &lt;code&gt;.com&lt;/code&gt; and two root servers).
In addition, some operators also published an Internet Draft
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-hal-adot-operational-considerations-02&quot;&gt;documenting&lt;/a&gt;, their concerns
which roughly come down to performance (due to the additional cost of
encryption and doing TCP) and about stability (which seems to be about
whether TLS/QUIC failures will cause resolution to fail).&lt;/p&gt;
&lt;p&gt;These concerns are actually sort of puzzling to Web people, for several
reasons. First, the vast majority of Web traffic is encrypted, including
key services like Google and Facebook, and once operators got past the
teething pains, this doesn&#39;t seem to have created increased stability
concerns. If Google goes down, it&#39;s an enormous deal, perhaps even bigger
than a DNS authoritative server failure, because recursive servers
cache data and so won&#39;t start failing immediately.&lt;/p&gt;
&lt;p&gt;Second, although encryption does increase load somewhat, even 10 years
ago it was a relatively small fraction of the cost of running a
server. In a 2012 talk by &lt;a href=&quot;https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html&quot;&gt;Langley, Modadugu, and
Chang&lt;/a&gt;
they reported that SSL/TLS accounted for less than 1% of CPU load on
their front-end machines, and of course both machines and TLS have
gotten faster.  It&#39;s true that serving DNS tends to be lighter-weight
because UDP is cheap and the servers are largely stateless (though
QUIC may help some here), but the overall load profile doesn&#39;t seem
like a big deal.  As a comparison point, all the root servers together serve on the
order of &lt;a href=&quot;https://blog.apnic.net/2020/08/21/chromiums-impact-on-root-dns-traffic/&quot;&gt;80 billion queries a
day&lt;/a&gt;.
This is equal to less than an hour of of &lt;a href=&quot;https://blog.cloudflare.com/cloudflare-thwarts-17-2m-rps-ddos-attack-the-largest-ever-reported/&quot;&gt;Cloudflare&#39;s&lt;/a&gt;
query volume, so doesn&#39;t seem that impractical to protect.
It&#39;s certainly possible—even likely—that it
would require those operators to invest more than they have
in infrastructure, but it seems far from impossible.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said above, the situation is in flux, but overall, I&#39;m not that
optimistic. This is a system with a lot of moving parts and where
a number of the veto points have relatively little incentive to change
their operations, or as is the case with the root operators, be
actively skeptical of doing so. If we look at the situation
with DNSSEC deployment, which DNS operators are relatively enthusiastic
about and which still has a lot of friction points, the prospects
for any kind of signaling for ADoX don&#39;t look that great.
The prospects for some sort of probing/unauthenticated mode—potentially
with an HSTS-style upgrade—seem a little better, but even that seems
like it may be a stretch.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Really, it would probably be on &lt;code&gt;ns.example.net&lt;/code&gt;
but I&#39;m simplifying. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is the situation on the Web,
hence HSTS. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This may all seem obvious to people who understand DNS, but
it took me a while to work through it, so I think it might
help others too. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Or, more accurately, rents. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
And recursively from the root. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is one case where this still sort of works:
if (1) the target zone is signed and (2) the
sensitive label is one deeper than the target
zone, e.g., &lt;code&gt;sensitive-label.example.com&lt;/code&gt; and
(3) the recursive first queries the target authoritative
to check the NS record (&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-ns-revalidation-01&quot;&gt;NS revalidation&lt;/a&gt;). In that
case you can still protect the sensitive label. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This does entail more complexity, because it probably
requires a way to signal which kind of credential the authoritative
will use so that a recursive which only knows WebPKI or TLSA/DANE
knows if it will be able to connect. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-adox/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Qualifying for prestige races (and why you won&#39;t get into Western States)</title>
		<link href="https://educatedguesswork.org/posts/qualifying/"/>
		<updated>2022-01-16T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/qualifying/</id>
		<content type="html">&lt;p&gt;It&#39;s a common pattern: a new category of race starts up and
initially it&#39;s not very popular, so you can just sign up.
But the race can&#39;t accommodate an infinite number of participants,
and if the sport starts to get popular, you can start
to hit capacity limits. If they&#39;re not too bad you can just
make things first come first served, but some really
popular races—especially prestige ones like the Boston
Marathon or the Hawaii Ironman—are in such demand that
they would just fill up instantly. Obviously, this is one
way to ration entry, but it&#39;s odd to choose based on
how good someone is it hitting reload on their browser and
unlike &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/&quot;&gt;COVID vaccination&lt;/a&gt;,
it&#39;s not just a simple matter of prioritization: some people will get in and some will
not. Selecting the lucky few turns out to be a somewhat
complicated problem, and the three endurance sports I&#39;m most familiar with
(road running, triathlon, and ultramarathons) have all developed different solutions.&lt;/p&gt;
&lt;p&gt;At a high level, you can select people based on two basic
criteria: merit and luck. Luck is theoretically easy: run a lottery
(though in practice it&#39;s usually not that simple).
Merit is more complicated, for reasons I&#39;ll get into below.&lt;/p&gt;
&lt;h2 id=&quot;road-racing&quot;&gt;Road Racing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qualifying/#road-racing&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Road race fields are typically very large (for instance, the 2019
Boston Marathon had 30000 runners), and so only the most famous and
popular races need to do anything special beyond first come first served.
If you&#39;re a popular race, though, you need to do something different.
Boston is by far the most prestigious marathon in the US—and probably
the world—and therefore is heavily in demand, even with this
big a field size. They run a relatively straightforward system:
each age bracket (mostly 5-years) has a &lt;a href=&quot;https://www.baa.org/races/boston-marathon/qualify&quot;&gt;qualifying time&lt;/a&gt;.
If you hit the qualifying standard in any certified marathon
then you are eligible to apply for Boston.
A similar system is used for the US Olympic trials in marathon,
where there is a qualifying time tuned to generate a field
of a few hundred or so.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This doesn&#39;t guarantee you entry, though: because more people hit the
qualifying time than they can admit they also have a year-to-year
adjustment to the qualifying time. For instance, if you are 41, your
qualifying time in 2021 year was 3:10, but because of the small field
size this year, they had an unusually high cut-off of 7:47, meaning
you had to actually run 3:02:13 to be admitted.
On the other hand,
fewer people applied in 2022 and everyone with the official time got
in. These times are fast,
but are not out of reach for reasonably good runners.
Many other prestige
races use a combination of lotteries and time qualification.&lt;/p&gt;
&lt;p&gt;Time-based qualification works well for road racing (or track) because times
are relatively consistent and depend mostly on the flatness
of the course and the weather (specifically, temperature
and wind).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
This means that most people have a fast (which is to say flat,
low wind, cool) course available to them without too much
effort, and so they have an opportunity to turn in a fast time.
Indeed, it&#39;s quite common for races to advertise themselves
as &amp;quot;flat&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
fast&amp;quot; and perfect for Boston Qualifying. Popular
places to get the &amp;quot;BQ&amp;quot;, as they say, are
&lt;a href=&quot;https://runsignup.com/Race/IL/Vienna/TunnelHillMarathon&quot;&gt;Tunnel Hill&lt;/a&gt;
run in November in Illinois and
&lt;a href=&quot;https://runsra.org/california-international-marathon/&quot;&gt;California International Marathon (CIM)&lt;/a&gt;
run in December in Sacramento&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;triathlon&quot;&gt;Triathlon &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qualifying/#triathlon&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The Ironman race that everyone has as their goal is the Hawaii Ironman
(aka &amp;quot;Kona&amp;quot;).
By contrast to road racing, triathlon courses are somewhat less
standardized and there are fewer races, so that means that there&#39;s
a fair amount of variation in finish times; for instance the Ironman
German course record is 7:41 and the Ironman Lanzarote record is 8:30.
This, coupled with the relatively small number of entrants
in Hawaii (about 2500) means that time criteria don&#39;t work
well; there will be too much uncertainty at the margin.&lt;/p&gt;
&lt;p&gt;Instead, the way this works is that Ironman Hawaii gives each race a
&lt;a href=&quot;https://www.ironman.com/im-world-championship-2022-slot&quot;&gt;fixed number of &amp;quot;slots&amp;quot;&lt;/a&gt;,
which is to say the number of athletes they can send to Kona.
These slots are then allocated to each (typically five year)
age bracket + gender (e.g., Male 25-29). If there are (say) 5
slots in a given age group, then they go to the top athletes
in that age group. If a qualifying athlete doesn&#39;t want the slot—or
already has one—then it &amp;quot;rolls down&amp;quot; to the next athlete.
In some case, it&#39;s been known to happen that a slot will roll down
off the end of the age group (especially in smaller age groups),
and go to another age group.
This structure creates a slightly odd dynamic: As with Boston
qualifying, people gravitate to specific races, not on the
basis of time but rather on the basis of which races appear
to have &amp;quot;soft&amp;quot; winning times and thus be easier to qualify at.
This can make a big difference if you are a solid but not elite
age grouper who is just on the border of qualifying. I myself once
flew to New Zealand to race because the previous year had
had fairly slow winning times (I &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Did_Not_Finish&amp;amp;id=1065307250&amp;amp;wpFormIdentifier=titleform&quot;&gt;DNFed&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Interestingly, the Hawaii Ironman used to run a lottery in which
you could pay $50 to enter, but it appears that they
have stopped doing that due to a &lt;a href=&quot;https://www.triathlete.com/events/ironman/the-future-of-the-kona-lottery/&quot;&gt;settlement with the Federal government&lt;/a&gt;
which treats it as gambling, I think because they charged you whether
you got in or not.&lt;/p&gt;
&lt;h2 id=&quot;ultramarathons&quot;&gt;Ultramarathons &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qualifying/#ultramarathons&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Ultras tend have even smaller field sizes than triathlons, both
for logistical and historical reasons. The logistical reason is
that it&#39;s hard to have a lot of people on single-track mountain
trails—and of course it&#39;s hard on the trails. For instance, even the comparatively large
&lt;a href=&quot;https://utmbmontblanc.com/en/&quot;&gt;Ultra-Trail de Mont Blanc (UTMB)&lt;/a&gt;,
the most prestigious European long distance ultra,
has a field size of &lt;a href=&quot;https://www.runnersworld.com/races-places/a28789165/ultra-trail-du-mont-blanc/&quot;&gt;only around 2300 runners&lt;/a&gt;.
The most prestigious North American ultra, &lt;a href=&quot;https://www.wser.org/&quot;&gt;Western States&lt;/a&gt;
has a field size of under 400. The reason for this is that some of the event
takes place in a wilderness region where races are technically forbidden,
and so the race operates under a permit that keeps it to the size of the
event before the wilderness was created. Other famous North American
ultras like &lt;a href=&quot;https://www.hardrock100.com/&quot;&gt;Hardrock 100&lt;/a&gt; or
&lt;a href=&quot;https://lakesonoma50.com/&quot;&gt;Sonoma 50&lt;/a&gt; also have
relatively small field sizes.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Unlike both road racing and triathlon, ultras manage the problem of
oversubscription (at least for amateurs) almost entirely by luck and
not by merit. As an example, Sonoma 50 runs a simple blind lottery for
all admissions, including pros. The sole exception is the previous
year&#39;s winner, who gets in without being in the lottery. It doesn&#39;t
matter if you&#39;re back of the pack or going for the win, you&#39;re all in
the same lottery. A more common structure is to have some kind of
special affordance for professionals. It&#39;s not clear to me why this
system has evolved, but I suspect it&#39;s something do with the generally
less competitive ethos of trail running as well as the relative
youth of the sport.&lt;/p&gt;
&lt;h3 id=&quot;western-states&quot;&gt;Western States &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qualifying/#western-states&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Western States has a particularly ornate system, consisting of a
set of about 100 &lt;a href=&quot;https://www.wser.org/automatics/&quot;&gt;&amp;quot;automatic entrants&amp;quot;&lt;/a&gt;
plus a &lt;a href=&quot;https://www.wser.org/lottery/&quot;&gt;lottery&lt;/a&gt; with about 270 spots.
The automatics are largely elites of various flavors, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The top 10 men and women in the previous year&lt;/li&gt;
&lt;li&gt;6 spots for elite athletes (mostly non-Americans) from the
Ultra Trail World Tour.&lt;/li&gt;
&lt;li&gt;The top two men and women from 6 different &lt;a href=&quot;https://www.wser.org/golden-ticket-races/&quot;&gt;Golden Ticket&lt;/a&gt;
races.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;Around 10 slots for race sponsors. For instance, Jim Walmsley
famously won in 2020, turned down his automatic slot for 2021
because he didn&#39;t think he was going to race and then got
in via his sponsor, shoe company &lt;a href=&quot;https://www.hoka.com/en/us/&quot;&gt;Hoka&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you&#39;re not good enough to run your way in or have a sponsor
who will get you in (and you&#39;re not &lt;a href=&quot;http://gordonainsleigh.com/&quot;&gt;Gordy Ainsleigh&lt;/a&gt;
who ran the course on foot back when it was just
the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Tevis_Cup&amp;amp;oldid=1058174751&quot;&gt;Tevis Cup&lt;/a&gt;,
&lt;a href=&quot;https://www.robertstech.com/run/writing/cowman.htm&quot;&gt;Cowman AmooHa&lt;/a&gt;,
or a few of the other notables), then it&#39;s the lottery for you.&lt;/p&gt;
&lt;p&gt;The way the WS lottery works is that each year you have to &amp;quot;qualify&amp;quot;
by finishing—occasionally within a certain time—one of
a set of &lt;a href=&quot;https://www.wser.org/qualifying-races/&quot;&gt;specified races&lt;/a&gt;.
Unlike with Boston or the Hawaii Ironman, these qualifying
requirements aren&#39;t set to pick out elite runners but just
to weed out people who have no real chance of finishing
Western. For instance, it&#39;s sufficient to finish
&lt;a href=&quot;https://educatedguesswork.org/posts/sob100k&quot;&gt;Sean O&#39;Brien 100K&lt;/a&gt; in under 16 hours.
I&#39;m not saying this is easy, but I finished under 13 hours and
was well off the podium.&lt;/p&gt;
&lt;p&gt;This all worked reasonably OK until the mid 2010s, at which
point the number of applicants exceeded the number of slots
by about a factor of about 10 and there were people who had
been waiting to get in for 5 years. In 2015, they introduced
a new system in which the number of lottery tickets doubled
for every year you didn&#39;t get in. With a few small modifications&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;,
this is the system that exists now.&lt;/p&gt;
&lt;p&gt;The obvious problem with this system is that it doesn&#39;t
make any more slots; it just reallocates the probability
of getting in from newer people to older people.
This of course reduces the number of people who have been
waiting a really long time, but at the cost of making
it very unlikely for new people to get in. For instance,
someone who entered the lottery for the first time
in 2021 (for the 2022 race) had &lt;a href=&quot;https://www.wser.org/2019/11/29/2020-lottery-statistics/&quot;&gt;a 1.3% chance of getting
in&lt;/a&gt;,
and it&#39;s just going to get worse as long as more people
want to run Western than can be accommodated via the lottery.&lt;/p&gt;
&lt;h3 id=&quot;hardrock-100&quot;&gt;Hardrock 100 &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qualifying/#hardrock-100&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Hardrock 100 has an especially goofy &lt;a href=&quot;https://www.hardrock100.com/hardrock-lottery.php&quot;&gt;system&lt;/a&gt;, with three
separate lotteries:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Category&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Number of Tickets&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Never finished&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Veterans (five or more finishes)&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Everyone else&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;55&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;When you add this up, you see that more than half of the slots are
given to people who have already run Hardrock, so this has precisely
the opposite bias as Western States uses (although they do use
a similar doubling scheme for Never Finished, so at least it
tends to reward waiting).&lt;/p&gt;
&lt;p&gt;In practice, this has resulted in a terrible gender balance
for Hardrock: because historically most of the people who have
run Hardrock are men, this system just perpetuates that imbalance
and will continue to do so as long as the number of first-time
women doesn&#39;t massively increase.
Starting in 2022, Hardrock&#39;s policy is to admit women in proportion
to their fraction of the lottery pool. This won&#39;t actually bring
gender balance because the number of men who enter is far
greater, but it&#39;s potentially a step in the right direction.
The &lt;a href=&quot;https://www.highlonesome100.com/lottery-design&quot;&gt;High Lonesome 100&lt;/a&gt;
has gone even further and selects exactly as many women as men.&lt;/p&gt;
&lt;h3 id=&quot;utmb&quot;&gt;UTMB &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qualifying/#utmb&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;UTMB followed a similar path, starting with open entrance, then
qualification, and finally a lottery, including a similar
doubling scheme to Western States (the site suggests
that they will no longer double after 2022).
However they have now
introduced a new change to the system in which you can collect
&amp;quot;running stones&amp;quot; for participating in specific races
(especially races owned by UTMB!) with each stone counting as another lottery entry.
So, for instance, you get 9 stones for Thailand By UTMB. And the more
races you do the more stones you collect. We should anticipate
that in the future the majority of people will be
selected via this mechanism, both because it&#39;s obviously
a huge advantage and because the more people start using
it the more of a disadvantage you are for just entering
the ordinary lottery. This is, of course, good for business!&lt;/p&gt;
&lt;h3 id=&quot;the-long-term&quot;&gt;The Long-term &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qualifying/#the-long-term&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I mentioned above, as long as more people want to do these races
than can be accommodated, any lottery system is sort of a temporary
measure, because most people won&#39;t get to do the race ever.
For instance, there were over 3000 first year applicants for the 2022
Western States. It would take over 10 years just to have all of them
race, in which time another 30,000 or so people would be
waiting.
I think it&#39;s only now that people are starting to come
to term with this and realize they are unlikely to
ever get into Western States or Hardrock.
Moreover, increasing the odds for people who have been
waiting longer will actually have the paradoxical effect that wait
times for people who get in continue to increase as the right hand
side of the distribution is increasingly favored (the wait times of
people who don&#39;t get in will of course always be infinite).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/wser-graph.png&quot; alt=&quot;Western States Lottery Simulation&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The graph above shows a simulation of 10 years of the Western States
lottery under the (very conservative) assumption that the number of
new entrants will continue to remain the same (in fact, it has been
increasing for years). The area shows the distribution of wait times
and the black line the mean number of years that selected runners have
been in the lottery. As you can see, this means that the population of
lottery winners will have been waiting longer and longer and will be
getting correspondingly older. This is going to get especially weird
in another 10-15 years as the pros are typically fairly young (under
40), so even more than usual you&#39;ll have two races, one for pros and
one for amateurs.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/qualifying/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the end of the day there really isn&#39;t a great solution: there are
just more people who want to do these races than can plausibly do so,
so you need some way to select the lucky few.
It seems like one could
make an argument for either performance-based qualification
(Boston and Kona) or lottery-based qualification. However, it seems
to me that the doubling system used by Western States and the
quota system used by Hardrock are long-term unstable, the former
because it&#39;s just going to create an older and older population
and the latter because it just seems unfair to favor people who
have done the race 5 times over people who have never done it.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This actually went a bit wrong in 2020, when they overshot
the mark for women. The women&#39;s standard
for entering the trials in the marathon was 2:45 and
511 women qualified. The standard has been dropped
to &lt;a href=&quot;https://www.msn.com/en-us/sports/more-sports/usatf-announces-tougher-olympic-marathon-trials-standards-for-2024/ar-AARrmY9&quot;&gt;2:37 for 2024&lt;/a&gt;.
I&#39;ve seen arguments that a big field was good, but obviously USATF doesn&#39;t agree.
It&#39;s certainly true that the logistics are hard because each
runner gets to have their own individualized nutrition at
aid stations, etc. &lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Temperature is actually a huge issue because running
generates a lot of heat and your body has to work to
get rid of it. The data is unsurprisingly &lt;a href=&quot;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0037407&quot;&gt;pretty noisy&lt;/a&gt;,
but the optimal temperature for running appears to be quite
cold, somewhere around 5-10&lt;sup&gt;o&lt;/sup&gt;C. &lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Courses can be net downhill but only by a little bit. &lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
CIM also advertises &amp;quot;More porta-potties per runner at the start and along the course than any event CIM staff and board has ever seen!&amp;quot;. This
is more important than you might think. British marathon legend Paula Radcliffe famously
had &amp;quot;bathroom issues&amp;quot; at the 2005 London Marathon and had to just &lt;a href=&quot;http://news.bbc.co.uk/sport2/hi/athletics/4454315.stm&quot;&gt;go on the side of the course&lt;/a&gt;, going on to win anyway. &lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I actually got into the Sonoma lottery this year and plan
to toe the line. &lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This works like Hawaii in that the slots roll down. &lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Specifically, they no longer require you to have
applied in consecutive years. &lt;a href=&quot;https://educatedguesswork.org/posts/qualifying/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>DNS Security, Part IV: Transport security for DNS (DoT, DoH, DoQ)</title>
		<link href="https://educatedguesswork.org/posts/dns-security-dox/"/>
		<updated>2022-01-05T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dns-security-dox/</id>
		<content type="html">&lt;p&gt;This is Part IV of my series on DNS Security
(parts &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane&quot;&gt;III&lt;/a&gt;).
In this part I cover transport security for DNS.&lt;/p&gt;
&lt;p&gt;For years most of the DNS security effort
went into &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec&quot;&gt;DNSSEC&lt;/a&gt;, which provides
authenticity for DNS data by signing the DNS records themselves.  This
left two big gaps. First, DNSSEC has seen fairly low levels of
deployment, leaving the majority of DNS resolutions unprotected
and most of the resolutions which benefit from DNSSEC only do so as far as the
recursive resolver. Second, DNSSEC doesn&#39;t provide confidentiality, so
DNS query data, which is naturally extremely sensitive, is wholly
unprotected. In this post I go into the various technologies to
address these gaps.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Disclaimer:&lt;/em&gt; I was (am) heavily involved in the design and deployment
of the Firefox DNS over HTTPS (DoH) deployment. The opinions below are mine
and not Mozilla&#39;s.&lt;/p&gt;
&lt;h2 id=&quot;overall-situation&quot;&gt;Overall Situation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#overall-situation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Recall the DNS resolution process from &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;Part I&lt;/a&gt;, shown
below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/dns-recursive-stub.png&quot; alt=&quot;DNS resolution process&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s easiest to think of this as just consisting of four independent
sets of transactions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Client to recursive&lt;/li&gt;
&lt;li&gt;Recursive to root&lt;/li&gt;
&lt;li&gt;Recursive to &lt;code&gt;b2.org.afilias-nst.org&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Recursive to &lt;code&gt;b.iana-servers.net&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Each of these transactions is a request/response exchange, typically
done over
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=User_Datagram_Protocol&amp;amp;oldid=1059120519&quot;&gt;UDP&lt;/a&gt;,
but sometimes over &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transmission_Control_Protocol&amp;amp;oldid=1060994683&quot;&gt;TCP&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you want to protect this system, a natural thing to do is just to
encrypt each transaction, resulting in a &lt;em&gt;set&lt;/em&gt; of
encrypted links to and from the recursive resolver. This
isn&#39;t a complete solution because the recursive resolver learns what
queries you are performing and unless you &lt;em&gt;also&lt;/em&gt; do DNSSEC validation
at the client, the recursive resolver can simply lie to you when
it sends you its results. However, it&#39;s also a significant improvement in security and
privacy because it protects the user from attacks outside the recursive
resolver. Moreover, we already have plenty of experience with
protecting this kind of data (just run it over TLS, or in the case of
UDP, perhaps DTLS) and so it&#39;s—at least in theory—technically
straightforward. In practice, however it turns out not to be so,
though for reasons that aren&#39;t really about the protocol itself.&lt;/p&gt;
&lt;p&gt;In this post, we&#39;ll focus on the (by comparison) easier problem of
protecting the client-to-recursive transaction, colored blue in
the diagram above. While this is a fast evolving area, there are a number of large-scale
deployments of encryption of this link. The problem of recursive-to-authoritative
is essentially unsolved and is the topic of a separate post. For now,
you can just assume that link is in the clear.&lt;/p&gt;
&lt;h2 id=&quot;server-authentication&quot;&gt;Server Authentication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#server-authentication&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic problem here is authentication. Forming an encrypted connection
is relatively easy—especially if you have a pre-made protocol like
TLS&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
to start with—but if you want security against an on-path attacker
then you need to authenticate the server; otherwise the attacker
can just impersonate the server and capture your queries. If they forward
the queries to the server themselves and the responses back (in a
so-called &amp;quot;man-in-the-middle attack&amp;quot;) then this will be invisible
to the client. It&#39;s generally not necessary to authenticate the client
to the server because the server&#39;s response doesn&#39;t depend on the client&#39;s
identity.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
In order to prevent this kind of attack, the client must know (1) that the server supports
encrypted transport and (2) the expected identity of the server. We discuss
these both below.&lt;/p&gt;
&lt;p&gt;Note: there are three major protocols being used for secure DNS transport:
DNS over TLS (DoT), DNS over HTTPS (DoH), and DNS over QUIC (DoQ). While
there are important technical differences, they are irrelevant for most
of the discussion below and it&#39;s conventional to refer to them collectively
as DoX and to refer to old unencrypted DNS as Do53.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;securing-the-stub-to-recursive-link&quot;&gt;Securing the Stub-to-Recursive Link &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#securing-the-stub-to-recursive-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As described in &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;Part I&lt;/a&gt;, endpoints
typically learn about the resolver via the network, which provides
them with an IP address for the resolver. This is a perfectly good
identity and it&#39;s possible to securely connect to that IP address as
the WebPKI supports IP addresses in certificates, but that doesn&#39;t
actually help very much, for two reasons.&lt;/p&gt;
&lt;p&gt;First, there&#39;s no way to know that the server actually supports
encrypted transport. You can configure the client to just try
encrypted transport and fall back to unencrypted transport if that
fails, but that means that any on-path attacker can just simulate
failure (e.g., by sending a TCP reset (RST)) and force you back to
unencrypted transport. Second, if the attacker is on your local network,
however, they can often interfere with that discovery process and substitute
their own resolver, in which case you form an encrypted connection to
the attacker, which isn&#39;t very useful.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;When the IETF originally standardized secure transports for DNS—and
specifically for stub to recursive—they defined the protocols
themselves but mostly punted on this
problem. Here&#39;s what &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc7858.html#section-4&quot;&gt;RFC 7858&lt;/a&gt;,
defining DNS over TLS (DoT) has to say:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This protocol provides flexibility to accommodate several different
use cases.  This document defines two usage profiles: (1)
opportunistic privacy and (2) out-of-band key-pinned authentication
that can be used to obtain stronger privacy guarantees if the client
has a trusted relationship with a DNS server supporting TLS.
Additional methods of authentication will be defined in a forthcoming
document [TLS-DTLS-PROFILES].&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is IETF language for &amp;quot;we don&#39;t have a good solution to this
problem, so we&#39;re going to give you some not very good options&amp;quot;.
However, when people went to actually do large-scale deployments,
they had to actually do something. So far, we are seeing two main
models evolve.&lt;/p&gt;
&lt;h3 id=&quot;same-provider-auto-upgrade-(spau)&quot;&gt;Same Provider Auto-Upgrade (SPAU) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#same-provider-auto-upgrade-(spau)&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first model, used by Chrome and Windows, is what&#39;s called
&lt;em&gt;Same Provider Auto-Upgrade (SPAU)&lt;/em&gt;. The basic idea is that the
client (either the browser or the OS) has a list of which recursive
resolvers support secure transport. If the IP address of the configured
resolver is on that list, then the client attempts to use secure
transport;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
otherwise it just uses regular insecure DNS.&lt;/p&gt;
&lt;p&gt;This design has two nice properties. First, it lets you quickly
upgrade a lot of people because there is a fair amount of concentration
in the resolver ecosystem. For instance about 15% of people use
&lt;a href=&quot;https://developers.google.com/speed/public-dns/&quot;&gt;Google Public DNS&lt;/a&gt;,
though not all of them will actually get upgraded, for reasons
we&#39;ll see below.
Second, it doesn&#39;t interfere with people&#39;s existing
configurations: for instance if they use an enterprise resolver
that does filtering or split horizon then they&#39;ll just continue
using it without change. As we&#39;ll see, the converse property is a challenge with
other models such as &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#trusted-recursive-resolver&quot;&gt;Trusted Recursive Resolver (TRR)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The main disadvantage of this design is that the level of security
it offers is quite limited because when (as usual) the client
learns about the resolver from the local network. If that
local network is malicious—or there is an attacker on it—then
they can just redirect you to their own resolver and this design
provides no security at all. Where it &lt;em&gt;does&lt;/em&gt; provide security is
when your local network is secure (e.g., a home network) but
the uplink to the recursive resolver may be insecure. But if
you don&#39;t trust the local network (e.g., you&#39;re in an airport
or a coffee shop) then SPAU doesn&#39;t provide much additional
security or privacy.&lt;/p&gt;
&lt;p&gt;There are also several practical deployment problems. First,
even if the real recursive resolver you are using supports secure
transport, it&#39;s quite common for people&#39;s local networks to
have some sort of DNS resolver endpoint in the WiFi gateway
or customer access router (the technical term here
is &lt;em&gt;customer premises equipment (CPE)&lt;/em&gt;), in which case even if
the upstream resolver supports secure transport, you won&#39;t get it
until the CPE upgrades (which does not happen often). I&#39;ve seen
estimates that in some countries over 80% of people have this
kind of configuration. Second,
this design requires the software vendor to keep a list of recursive
resolvers that support secure transport, which doesn&#39;t scale well.
This mode is on by default in &lt;a href=&quot;https://duo.com/decipher/google-makes-dns-over-https-default-in-chrome&quot;&gt;Chrome&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;trusted-recursive-resolver-(trr)&quot;&gt;Trusted Recursive Resolver (TRR) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#trusted-recursive-resolver-(trr)&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Firefox uses a different model, called a &lt;em&gt;Trusted Recursive Resolver (TRR)&lt;/em&gt;. The
idea here is that instead of accepting the resolver provided by the network,
Firefox has a list of resolvers which have agreed to comply with
strong &lt;a href=&quot;https://wiki.mozilla.org/Security/DOH-resolver-policy&quot;&gt;privacy and transparency requirements&lt;/a&gt;.
These include very short data retention periods and strict limits on
how the data can be used.
When possible, Firefox will automatically select one of those resolvers and
securely connect to it.&lt;/p&gt;
&lt;p&gt;This design has two main advantages when compared to SPAU. First,
it works even if the local resolver is insecure or untrustworthy
(e.g., in a coffee shop) because the browser picks a &amp;quot;known good&amp;quot;
resolver. Second, it provides encryption even if the local resolver
doesn&#39;t. However, because a TRR model often bypasses the local resolver,
this creates a number of challenges, as detailed below.&lt;/p&gt;
&lt;h4 id=&quot;information-leakage&quot;&gt;Information Leakage &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#information-leakage&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;There is an inherent privacy tradeoff in changing from the network&#39;s
resolver to a separate resolver because the network already has
a fair bit of information about your activity from observing
the rest of your traffic. Specifically, the network already gets to see
the IP addresses you are connecting to, which often only reflect
a single site (e.g., Facebook). Even in cases where there are a lot
of sites on the same IP address pool (as with some CDNs), the
TLS handshake can reveal the expected server through the
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Server_Name_Indication&amp;amp;oldid=1058995924&quot;&gt;Server Name Indication (SNI)&lt;/a&gt;
field&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.
Finally, it&#39;s possible to learn about which Web site people are
going to via &lt;a href=&quot;https://www.ietf.org/archive/id/draft-irtf-pearg-website-fingerprinting-01.txt&quot;&gt;traffic analysis&lt;/a&gt;
of the connection.&lt;/p&gt;
&lt;p&gt;Adding a third party resolver creates
a second entity besides the network which knows about your browsing
history, which creates some additional risk, even if that
entity has good policies. On the other hand,
these alternate mechanisms of learning about browsing history
are less efficient than just collecting DNS query logs,
and there is active work on closing most of these holes,
so this reduces your exposure to the network at the cost
of increasing your exposure to the TRR. However, unlike
your local network, the TRRs are required to have strong
privacy policies; by contrast, it is known that many
local networks do not. Nevertheless, this isn&#39;t an ideal situation and one that
is potentially addressable via proxying as discussed below.&lt;/p&gt;
&lt;h4 id=&quot;local-policy&quot;&gt;Local Policy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#local-policy&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;DNS is often used to apply various kinds of local—or
national—policies, for instance filtering adult content,
logging user behavior (e.g., for law enforcement), or providing
special &amp;quot;internal&amp;quot; domain names which aren&#39;t publicly
resolvable. For obvious reasons, if the client selects a different
resolver from that offered by the network, that resolver may
adopt different policies&lt;/p&gt;
&lt;p&gt;The difficult problem here is that it&#39;s hard to distinguish
between situations where the user wants some sort of special
policy treatment (e.g., blocking potentially malicious sites)
and ones where the user doesn&#39;t but the network operator
does (e.g., filtering out adult content). From a technical
perspective, these both look like interference/attack by the network.
Part of the value of securing DNS lookups is to protect
against network attacks, and so a naive TRR deployment
simply bypasses these policies, even if they were what
the user wanted. Firefox in particular
has some mechanisms to minimize this kind of impact, as
discussed &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#firefox-heuristics&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&quot;server-topology&quot;&gt;Server Topology &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#server-topology&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Most big server operators and CDNs have multiple points of presence
at different places in the network. These all have the same name but
different IP addresses. Because an ISP resolver knows
the actual location of the client in the network topology, if it
also knows something about the server&#39;s network, it can provide
a server that is topologically closer to the client, theoretically
providing better performance or making more efficient use of the ISP&#39;s
network. However, if the client uses a centralized recursive resolver—or
even one which doesn&#39;t know the ISP&#39;s topology—then this
kind if optimization may not be possible.&lt;/p&gt;
&lt;p&gt;This issue was a big concern when Firefox originally deployed the
TRR model, but &lt;a href=&quot;https://blog.mozilla.org/futurereleases/2019/04/02/dns-over-https-doh-update-recent-testing-results-and-next-steps/&quot;&gt;measurements&lt;/a&gt; suggest that in fact there is no real negative impact
on performance from using a trusted recursive resolver. It may
still be possible that there is an impact on network efficiency;
but this is more of an issue for the ISP than for users.&lt;/p&gt;
&lt;p&gt;The way that Firefox currently addresses this is to allow local
networks to &amp;quot;steer&amp;quot; queries to specific TRRs. The idea here is
that the local network might operate a TRR or have an arrangement
with one which they share topology information with and so
would prefer that clients use that. Currently, Comcast operates
such a TRR and Firefox uses a DNS-based &lt;a href=&quot;https://www.ietf.org/archive/id/draft-rescorla-doh-cdisco-00.html&quot;&gt;technique&lt;/a&gt;
to determine whether such a resolver is available/preferred. Note
that this doesn&#39;t allow the network to pick any resolver, just
to select between TRRs. I discuss a more generalized
solution &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#local-network-discovery&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&quot;national-boundaries&quot;&gt;National Boundaries &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#national-boundaries&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;As Mozilla was first looking at launching Firefox with its TRR program,
feedback from users indicated that many wanted
to have a TRR that was in their jurisdiction (or, for
many in Europe, a resolver in the EU).
Another issue is that policymakers in some countries were concerned that
resolvers would not comply with local regulations. Because
of these concerns, Firefox has been somewhat cautious with
its encrypted DNS rollout, and currently only has it
on by default in North America, using &lt;a href=&quot;https://1.1.1.1/dns/&quot;&gt;Cloudflare&lt;/a&gt; in the US
and &lt;a href=&quot;https://www.cira.ca/cybersecurity-services/canadian-shield&quot;&gt;CIRA&lt;/a&gt;
in Canada. As of this writing, work is underway
on expanding the program, though no specific plans have been
announced.&lt;/p&gt;
&lt;h4 id=&quot;firefox-heuristics&quot;&gt;Firefox Heuristics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#firefox-heuristics&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;For the reasons discussed above, if Firefox just enabled DoX for everyone,
this would cause problems for people&#39;s deployments. In order to address,
this, Firefox uses a set of heuristics designed to address three
important cases.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Enterprise-managed devices&lt;/em&gt;. In many cases, an enterprise will manage
a user device and install their own DNS server or make other configuration
changes. If Firefox detects this, it assumes that the enterprise won&#39;t
want to use a TRR and disables DoH (though the enterprise can explicitly
turn it on).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Parental controls&lt;/em&gt;. Some ISPs offer &amp;quot;parental controls&amp;quot; services which
use the DNS to filter out adult content (with the consent of the parents
if not the children). Firefox tries to detect this by checking to see
if certain &amp;quot;canary&amp;quot; domains (domains which don&#39;t actually correspond
to adult content but are used to test filtering) are blocked and if
so, disabled DoH.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Local domains/Blocking&lt;/em&gt;. Some networks will serve domains that
only resolve inside their own corporate network. If Firefox uses
a TRR, then these domains fail. Firefox addresses this by falling
back to Do53 if a domain is not found &lt;em&gt;or&lt;/em&gt; if DoH just generally
fails.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These heuristics are imperfect in two ways. First, they do not detect
some cases where the user or device administrator might want DoH
disabled. One important case is enterprise-owned devices where
the operator doesn&#39;t remotely manage them. Unfortunately, there
is no good way to detect this because any signal that is sent by
the network could have been sent by an attacker. This is why Firefox
requires evidence that the device is being &lt;em&gt;managed&lt;/em&gt; before disabling
DoH.&lt;/p&gt;
&lt;p&gt;Second, they sometimes disable DoH when they shouldn&#39;t. In particular,
networks can block the canary—or just block DoH generally—and
cause Firefox to use Do53. This allows the network to disable
encryption, which is obviously contrary to the goal of protecting
the user from the network. For the moment, Mozilla has been
treating this as a necessary compromise, but is monitoring the
rate at which it happens and in future may make it more obvious
to the user when DoH has been disabled and allow them to require
secure resolution.&lt;/p&gt;
&lt;h2 id=&quot;local-network-discovery&quot;&gt;Local Network Discovery &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#local-network-discovery&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One important feature of the DoX deployments by Firefox, Chrome, and
Windows is that they were something that clients could do on their own
without any cooperation from the network. The reason for this is
simply that it was the only way to get significant incremental
deployment of a solution that addressed a real threat to user privacy.
However, a number of network operators—and some
governments—objected that they were losing their ability to
control their networks. The result was months of of extraordinarily
contentious debate, both in the IETF and &lt;a href=&quot;https://techcrunch.com/2019/07/05/isp-group-mozilla-internet-villain-dns-privacy/&quot;&gt;in the
press&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At the same time, it was clear that neither the existing SPAU nor TRR
approaches were ideal, even from the perspective of the browser/OS
vendors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SPAU-style approaches required a centralized list of secure transport-compatible
resolvers and had no way of detecting that the local network actually had
such a resolver.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;TRR-style approaches just bypassed the network resolver even in cases
where it might be usable (e.g., in cases where that resolver was a TRR).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After months of loud discussion, the IETF decided to charter the &lt;a href=&quot;https://datatracker.ietf.org/wg/add/about/&quot;&gt;Adaptive DNS
Discovery (ADD)&lt;/a&gt; Working Group
to work on mechanisms to allow the client to &lt;em&gt;discover&lt;/em&gt; resolvers
and their properties without saying anything about what they would
do when they found them.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
In principle, such a solution could be used to feed into either an
SPAU solution (by saying that the local network supports an encrypted
resolver) or a TRR solution (by saying that it preferred one or more
TRRs), without requiring vendors to change their basic policies,
even if network operators wish they would.&lt;/p&gt;
&lt;p&gt;There&#39;s nothing particularly surprising about the approaches that
the ADD WG has come up with. Roughly speaking, they allow the network
to indicate (either via DHCP or via a DNS query) that an encrypted
resolver is available. When the indication is over DNS, the encrypted
resolver
has to have a WebPKI certificate for the IP that the client would
ordinarily use for Do53 resolution, although it can actually
operate on a different IP address.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt; This is a very important requirement
because it prevents an attacker from advertising a totally
unaffiliated encrypted
resolver that just steals your queries. Unfortunately, it is also
extremely limiting: It&#39;s very common for home network routers/WiFI APs,
etc. to have a &lt;em&gt;DNS proxy&lt;/em&gt; which takes DNS queries and forwards them
to the ISP resolver. This proxy will usually have an unroutable
IP address&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
which it&#39;s not possible to get a certificate for, in which
case the existing ADD solutions won&#39;t work for SPAU-type designs
(they work fine for TRR-style designs). There is active work
on trying to address this use case, but not consensus on
an approach or even that one is feasible.
With the DHCP-based system, you can use a standard domain name—because
DHCP is where you learn about the resolver in the first place—but
this still won&#39;t work well if the actual resolver is just some local
router because it probably won&#39;t have a globally resolvable name.
&lt;em&gt;[Updated 2022-01-17. Thanks to Neil Cook for pointing out that
the original text just covered the DNS version.]&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id=&quot;transport-protocols&quot;&gt;Transport Protocols &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#transport-protocols&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We&#39;ve gotten quite far without talking about the details of the
various protocols, but now it&#39;s time. There are three major
secure transport protocols which have been or are being standardized&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
for DNS:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc7858.html&quot;&gt;DNS over TLS (DoT)&lt;/a&gt;. This is
what you would expect, namely you open a TLS channel to the server
and send DNS queries over it. There is also a &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8094.html&quot;&gt;DNS over DTLS&lt;/a&gt;,
but that has gotten almost no usage and will probably be deprecated.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8484.html&quot;&gt;DNS over HTTPS (DoH)&lt;/a&gt;. This
maps DNS queries onto HTTP request responses and runs them over
HTTP over TLS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-dprive-dnsoquic-07.html&quot;&gt;DNS over QUIC (DoQ)&lt;/a&gt;. This
sets up a connection over the &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc9000.html&quot;&gt;QUIC&lt;/a&gt;
secure transport protocol and sends DNS queries over it. Note that
you can also run &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-quic-http-34.html&quot;&gt;HTTP over QUIC (HTTP/3)&lt;/a&gt;,
so it&#39;s possible to do DoH over QUIC (DoHQ?) but this is something
clients can do automatically without any new standards work, because
from the perspective of standards, it&#39;s just HTTP.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Conceptually these are all very similar and indeed, it&#39;s not
really clear why one needs both DoT and DoH (DoQ has better performance
properties, as would DoHQ). DoT was designed before DoH—though
unfinished when work on DoH started—but DoH has become more popular,
largely because browsers such as Chrome and Firefox chose to deploy
DoH rather than DoT (a decision made at least in part because browser vendors
are comfortable with HTTP). On the other hand, DoT was designed
primarily by the DNS community and is more popular there.&lt;/p&gt;
&lt;p&gt;There has been a lot of criticism of DoH from
operators who are concerned about the use of DNS transport for
bypassing their network-based controls (Paul Vixie has been
particularly &lt;a href=&quot;https://www.dnsfilter.com/blog/paul-vixie-and-peter-lowe-on-why-doh-is-politically-motivated&quot;&gt;vocal&lt;/a&gt; on this topic).
The primary relevant technical difference from the perspective
of a network operator is that DoT contains two pieces of protocol
metadata that make it easier to distinguish from other kinds of
TLS traffic: it typically runs over port 853 (rather than 443
as for HTTP over TLS) and has an &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7301&quot;&gt;Application Layer Protocol Negotiation (ALPN)&lt;/a&gt; identifier of &amp;quot;dot&amp;quot; rather than &amp;quot;h2&amp;quot;. By contrast,
DoH traffic just looks like HTTP traffic. The result is that
it&#39;s somewhat easier to have your network block DoT traffic.
However, it&#39;s not clear how long this will be true if there
is a lot of blocking. The DoH servers
currently commonly used by clients are also identifiable by IP and
SNI so they&#39;re relatively easy to block, and if server operators
want to conceal DoT, they can run it on port 443 and use ECH to
conceal the ALPN. Fundamentally, these are policy not technical
questions.&lt;/p&gt;
&lt;h2 id=&quot;security-and-privacy-properties&quot;&gt;Security and Privacy Properties &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#security-and-privacy-properties&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Whatever the transport protocol, at the end of the day what DoX is
designed to give you is a secure channel to the resolver so you know
that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Nobody but the resolver is seeing your query to the resolver.&lt;/li&gt;
&lt;li&gt;You are getting the result that the resolver is sending you.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;How valuable this is depends in part on how much you trust
the resolver: a secure channel to the resolver in your
local coffee shop doesn&#39;t do you much good because you
have no reason to trust that that resolver isn&#39;t lying
or publishing your queries (this is a lot of the rationale
for Mozilla&#39;s TRR design).&lt;/p&gt;
&lt;p&gt;Even if you &lt;em&gt;are&lt;/em&gt; connected to a resolver you trust,
the level of security and privacy you get is limited by
that resolver, especially if it&#39;s queries aren&#39;t encrypted, which seems
quite likely (again, see a future post).
First, if that resolver isn&#39;t validating DNSSEC
(or you are trying to resolve one of the majority of domains
which aren&#39;t DNSSEC-signed) then a network attacker might forge
responses to that resolver, which will happily pass them on.
Second, an attacker who is able to observe queries by
the recursive resolver may be able to infer which of them
are yours by looking at timing. This form of attack is
somewhat limited by the fact that recursive resolvers cache
responses and so won&#39;t necessarily issue new queries
to authoritative resolvers for every query, but it will
probably issue some of them. It&#39;s also possible to do
&lt;a href=&quot;https://www.usenix.org/conference/foci20/presentation/bushart&quot;&gt;traffic analysis&lt;/a&gt;
on the encrypted query stream from your machine to the recursive
resolver itself based on packet size and timing.&lt;/p&gt;
&lt;h3 id=&quot;oblivious-doh&quot;&gt;Oblivious DoH &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#oblivious-doh&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Even if you are connected to a known and trusted
resolver, it&#39;s still not ideal that that resolver gets
to see all of your queries as well as your IP address.
One way to address this is to &lt;em&gt;proxy&lt;/em&gt; your encrypted
DNS queries through a proxy which conceals your IP
address from the DNS server. That way, your queries
and IP address are never in the same place.
Apple is already doing this with &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-pauly-dprive-oblivious-doh-08&quot;&gt;Oblivious DoH&lt;/a&gt; and the IETF is standardizing a system
called &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-ohai-ohttp-00.html&quot;&gt;Oblivious HTTP&lt;/a&gt;
which can be used to proxy DoH traffic (there is no equivalent for DoT).&lt;/p&gt;
&lt;h3 id=&quot;dox-and-dnssec&quot;&gt;DoX and DNSSEC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#dox-and-dnssec&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If your problem statement is &amp;quot;how do we secure the DNS&amp;quot;, then you
might think of DoX and DNSSEC as competitors, and to some extent this
is true: resources being spent on DoH—and in this case it
is DoH and not DoT—in endpoints are not being spent on endpoint
DNSSEC. Moreover, because local networks are a powerful point of
attack and so a secure channel to a trusted resolver reduces the need
for DNSSEC validation.
In addition, to some extent DoX reduces the need for endpoint DNSSEC
verification because it allows endpoints to take advantage of DNSSEC
verification in the recursive resolver (assuming they trust it).&lt;/p&gt;
&lt;p&gt;However, from another perspective, DNSSEC and DoX are complementary:
DoX does something that DNSSEC does not, which is to
provide confidentiality. Even if every client did DNSSEC
validation, DoX would still serve an important privacy purpose;
I certainly don&#39;t see clients implementing DNSSEC
validation and then deciding to turn off DoX,
especially given that it provides important security for
the vast majority of domains which are not currently DNSSEC-signed.
On the other hand, DNSSEC does something DoX does not, which
is to provide end-to-end integrity.&lt;/p&gt;
&lt;p&gt;Second, DoX is actually an enabling technology for DNSSEC:
one of the big concerns about DNSSEC deployment is that network
intermediaries will not convey DNSSEC records directly, thus
creating false positive failures when DNSSEC validation fails.
However, any resolver which speaks DoX is quite likely to also
handle DNSSEC correctly—this can be guaranteed in a TRR
system—and thus DoX has the potential to make the risk of deploying
endpoint DNSSEC lower and thus perhaps modestly increase the chance of it
happening.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-recursive-to-authoritative&quot;&gt;Next Up: Recursive to Authoritative &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#next-up%3A-recursive-to-authoritative&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So far I&#39;ve really focused on the endpoint perspective, but of
course DNS resolution actually involves much more than the
stub to recursive link. In the next post I&#39;ll address the
difficult problems of encrypting the link between the recursive
and authoritative servers.&lt;/p&gt;
&lt;h2 id=&quot;appendix%3A-how-ddr-works&quot;&gt;Appendix: How DDR Works &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#appendix%3A-how-ddr-works&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The IETF has proposed two main protocols for discovery of
encrypted resolvers &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-ietf-add-ddr/&quot;&gt;Discovery of Designated Resolvers
(DDR)&lt;/a&gt;, which is
DNS-based and &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-ietf-add-dnr/&quot;&gt;DHCP and Router Advertisement Options for the Discovery
of Network-designated Resolvers
(DNR)&lt;/a&gt;, which
uses the same mechanisms that clients use to autoconfigure themselves
for a given network. From my perspective, DDR is the more interesting
one because it (sometimes) works without changing customer premises
equipment, a process which takes a long time.&lt;/p&gt;
&lt;p&gt;The basic setting here is one in which the ISP has both a
traditional Do53 resolver and an encrypted resolver (of any
flavor, whether DoH, DoT, etc.). However, they don&#39;t control
the customer premises equipment, which means that they can&#39;t
change the DCHP or IPv6 RA-type configuration provided by
that equipment. The way around this is that the client
asks the resolver whether it has an encrypted version.
The basic flow looks like this:&lt;/p&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/ddr.png&quot; width=&quot;500px&quot; alt=&quot;DDR Discovery flow&quot; /&gt;
&lt;p&gt;When the client joins the network, it is provided with
the IP address of the Do53 server in a DHCP option (this assumes
DHCP). This is just the normal situation without DoX.
Next, the client makes a request to the Do53 server for
a special domain (&lt;code&gt;resolver.arpa&lt;/code&gt;). The Do53
server responds with the address of the DoX resolver
and the client can then connect to it. There are two
important points to note here.&lt;/p&gt;
&lt;p&gt;First, the identity that the client expects the DoX server to present
is the IP address that it was configured with via DHCP. Recall that
the threat model here is that the attacker is able to interfere
with your connection to the Do53 server—otherwise you wouldn&#39;t
need encryption—and so you can&#39;t trust the new IP address
you get from it. This way at worst you end up encrypting to
someone who controls the IP address you were going to send
your Do53 traffic to anyway.
Second, this explains why DDR doesn&#39;t work if the CPE has a
DNS proxy: in that case you will get the IP address of that
proxy and therefore
the ISP&#39;s DoX server won&#39;t have a valid certificate to use to
authenticate as that server.&lt;/p&gt;
&lt;p&gt;As should be clear from the above, DDR is mostly useful for
SPAU models, but you can also use it for steering in a
TRR system.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though actually designing such a protocol is &lt;em&gt;not&lt;/em&gt; easy. A topic for
another day. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
One exception here is outsourced cloud-based &amp;quot;enterprise&amp;quot; DNS offerings like
&lt;a href=&quot;https://www.opendns.com/&quot;&gt;OpenDNS (now called Umbrella)&lt;/a&gt; which
but may want to authenticate that users are actually employees before providing answers. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Because it runs on UDP and TCP port 53. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are situations in which someone manually configures the
resolver address for instance to bypass the network resolver,
but they are comparatively infrequent. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;m not sure if the clients hard fail if they can&#39;t successfully
connect, but in principle you could. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though the TLS working group is hard at work on &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-tls-esni-13.html&quot;&gt;fixing this&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is a pretty typical IETF &amp;quot;mechanism not policy&amp;quot; type of
compromise. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I know this feels counterintuitive, but it&#39;s actually the way
that HTTPS works now. If I go to &lt;code&gt;www.example.com&lt;/code&gt; and
there is a CNAME to &lt;code&gt;www.cdn.example&lt;/code&gt;, the
client checks the certificate for &lt;code&gt;www.example.com&lt;/code&gt;.
The reasoning here is that the original identity is
what the client wanted and the redirect is just some
behavior by an untrusted network. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
These are drawn out of blocks designed for local use, such
as those defined by &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc1918.html&quot;&gt;RFC 1918&lt;/a&gt;.
The key point is that these addresses will be shared and therefore
cannot get certificates.
 &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are also two non-standard protocols in use,
&lt;a href=&quot;https://www.dnscrypt.org/&quot;&gt;DNSCrypt&lt;/a&gt;
and &lt;a href=&quot;https://dnscurve.org/&quot;&gt;DNSCurve&lt;/a&gt;
but for various reasons, the IETF opted to start with its
existing secure transports. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dox/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Privacy for Genetic Genealogy: Happy Goldfish Bowl Everyone</title>
		<link href="https://educatedguesswork.org/posts/dna-genealogy/"/>
		<updated>2022-01-02T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dna-genealogy/</id>
		<content type="html">&lt;p&gt;The
combination of &amp;quot;consumer genetics&amp;quot; (CG) in the
form of widespread cheap genetic testing
and crowdsourced genealogical DNA databases like &lt;a href=&quot;https://www.gedmatch.com/&quot;&gt;GEDmatch&lt;/a&gt;
has opened up whole new possibilities in the use of genetic data.
One of these is that you can often identify—or at least
partially identify—the source of an
unknown DNA sample based on known samples voluntarily submitted by
their relatives. This has obvious applications for criminal
investigation, as described in a recent
&lt;a href=&quot;https://www.nytimes.com/2021/12/27/magazine/dna-test-crime-identification-genome.html?searchResultPosition=1&quot;&gt;article&lt;/a&gt; in the New York Times.&lt;/p&gt;
&lt;p&gt;This is something I&#39;ve been expecting for some
time, ever since widespread cheap DNA analysis
became available. It doesn&#39;t even require sequencing,
which is still &lt;a href=&quot;https://www.illumina.com/science/technology/next-generation-sequencing/beginners/ngs-cost.html&quot;&gt;somewhat expensive&lt;/a&gt; (on the order of $1000)
but rather a much cheaper technique that
just looks at specific sites,
where there is known to be variation in single base pairs,
so-called &lt;em&gt;Single-nucleotide polymorphisms (SNPs)&lt;/em&gt;. You
can then use technologies like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=DNA_microarray&amp;amp;oldid=1060776591&quot;&gt;DNA microarrays&lt;/a&gt; to examine those regions.&lt;/p&gt;
&lt;p&gt;The way this works is that there are a number of
&lt;em&gt;Direct To Consumer (DTC)&lt;/em&gt; genetic testing companies like &lt;a href=&quot;https://www.ancestry.com/&quot;&gt;AncestryDNA&lt;/a&gt; and &lt;a href=&quot;https://www.tellmegen.com/?lang=en&quot;&gt;tellmeGen&lt;/a&gt; which
will analyze your DNA from a sample (typically saliva)
and give you a digital file with
the information. This costs about $100 US.
You can then upload that data to one
of a number of databases designed for genealogical applications,
such as &lt;a href=&quot;https://www.gedmatch.com/&quot;&gt;GEDmatch&lt;/a&gt;, which
let you compare the sample you uploaded to that of other
people. This lets you see
who is closely related (and potentially on which side of
the family (because of genes which appear on only the X or Y chromosome)
and gradually build up at least a partial family tree for
the submitted sample.&lt;/p&gt;
&lt;p&gt;The advertised purpose for this kind of database is to tell people
about their ethnic heritage, help them find unknown relatives,
etc. but of course there are obvious law enforcement applications.
Most famously, genealogical DNA analysis was used to identify
the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Joseph_James_DeAngelo&amp;amp;oldid=1062694681&quot;&gt;Golden State Killer&lt;/a&gt;, but the main subject of the NYT piece, CeCe Moore,
has also solved a number of other cold cases using DNA-based
techniques. This all seemed to be
being done on a sort of ad hoc basis without a lot of thought
given to the bigger picture when suddenly there was a lot of public
attention and the genealogy sites had to quickly figure out the broader implications:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Two days later, GEDmatch became all but useless to Moore.&lt;/p&gt;
&lt;p&gt;Following the Golden State Killer arrest, in 2018, the site had
posted a warning to users that police were uploading profiles, and
hastily instituted a policy restricting such use to homicides,
sexual assaults and unidentified bodies. But a few weeks before the
Idaho Falls announcement, it emerged that one of the site’s
founder-operators had, in a somewhat naïve, grandfatherly way, made
an exception for a detective in Utah investigating a recent
attempted murder. Moore was the one tasked with identifying the
suspect (and did). Around the same time, it also emerged that
FamilyTreeDNA, a consumer site with more than two million users, had
been discreetly allowing the F.B.I. to upload suspect profiles to
its database for genetic-genealogy searches.&lt;/p&gt;
&lt;p&gt;GEDmatch scrambled to opt all accounts out of law-enforcement
searches by default. Overnight, Moore’s available matches went from
over a million profiles to zero, and her ability to work new cases
practically vanished. “People will die,” she told CNN. In the months
that followed, the handful of genetic genealogists whom she had
recruited to build out the Parabon team had their hours cut, and she
spent most of her time toiling on old cases for which she already
had the list of matches.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a pretty common pattern with technologies with
privacy implications, which is that the impact on
privacy depends strongly on &lt;em&gt;scale&lt;/em&gt;; the impact is small when DNA
analysis costs tens of thousands of dollars and we only
have a few samples, but as technology—both
collection technology and processing technology—improves,
we have a situation where mass surveillance
becomes not just possible but cheap. Other situations where
we can see this happening are
&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates&quot;&gt;automatic license plate readers&lt;/a&gt;,
face recognition, and doorbell cameras.&lt;/p&gt;
&lt;h2 id=&quot;how-effective-is-genetic-genealogy%3F&quot;&gt;How effective is genetic genealogy? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#how-effective-is-genetic-genealogy%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Using one of these databases, it is quite cheap and effective to partially
identify someone from their DNA sample.
&lt;a href=&quot;https://www.science.org/cms/asset/089c0893-0dc3-4668-be4a-cff6d3915fad/pap.pdf&quot;&gt;Ehrlich et al.&lt;/a&gt;
report that if you have a sample of 2% of the population it
will be possible to find a third cousin for 99% of the
population and a second cousin for 65% of the population.
When combined with actual genealogical data, this gives you
a set of potential people the sample could have come from.
The NYT (and Ehrlich) describe a time consuming manual process for narrowing
things down a specific individual, but this seems like the
kind of thing that specialized software would make easier,
and of course the more samples you have, the better it will
work.&lt;/p&gt;
&lt;p&gt;The process of collecting the samples and populating the database is
also fairly cheap, but more importantly, the person doing the
investigation doesn&#39;t have to pay that cost because people are doing
it voluntarily. They only have to pay the cost
for the unknown sample they want to target, but we&#39;re talking about
$100 for a sample kit. They also have to collect that sample,
but—and here&#39;s the part that should make you nervous—they
don&#39;t really need the subject&#39;s cooperation for this. The NYT article
mentions two specific cases, one in which the suspect &amp;quot;spit out his
chewing gum on a bike ride&amp;quot; and another in which the suspect
&amp;quot;momentarily opened the door of his semi truck to reach around behind
the cab, and let fall a coffee cup with DNA&amp;quot;.&lt;/p&gt;
&lt;p&gt;This sort of data collection is well within the reach of ordinary
people, not just law enforcement. If the target
leaves a coffee cup in the trash or a cigarette butt on the
ground, anyone can potentially pick it up and identify them using
exactly these techniques, and once they have the target&#39;s name,
they are in a position to violate their privacy in
other ways. It&#39;s actually easier in this case than in the
criminal &amp;quot;unknown sample&amp;quot; cases because if you have seen
the person and so once you have candidate names, you can narrow
it down by their appearance.&lt;/p&gt;
&lt;h2 id=&quot;how-to-provide-privacy%3F&quot;&gt;How to provide privacy? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#how-to-provide-privacy%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This kind of data has a number of features
that seem to make it very hard to keep private using the
usual techniques we think about:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;The privacy issue isn&#39;t created by the collection of your
data but by the collection of other people&#39;s data.&lt;/em&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This makes
it more difficult to protect your own privacy. For instance,
it&#39;s not enough to have my own sample be opted out of
research applications, I have to have all my relatives samples
protected as well. In order to protect myself, I have to
make sure nobody else can collect my DNA data, which, as
we saw above, is pretty difficult.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;The intended use case and the adversarial use case are basically the same.&lt;/em&gt;
In many data privacy situations, the object of your analysis
isn&#39;t privacy sensitive, but the data itself is. In these
cases, there are &lt;a href=&quot;https://educatedguesswork.org/tags/privacy%20preserving%20measurement/&quot;&gt;technical approaches&lt;/a&gt;
to let you analyze the data without taking the risk of exposing
the source data. However, for this data, one of the main
&lt;a href=&quot;https://www.gedmatch.com/solutions-details-one-to-many-dna-comparison&quot;&gt;use cases&lt;/a&gt;,
is to find close matches, which is precisely what you need
in order to re-identify someone. This makes it hard to build
effective technical controls without significantly reducing
the available system functionality.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At the end of the day, it seems likely that effectively providing
privacy for this kind of data will require new legal policies.
However, I have seen two proposed sets of (semi)-technical controls
that consumer genetics companies might apply to help reduce
privacy risk posed by their systems: (1) limiting law enforcement access and (2) requiring
that the samples be validated.&lt;/p&gt;
&lt;h3 id=&quot;limiting-access-by-law-enforcement&quot;&gt;Limiting Access By Law Enforcement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#limiting-access-by-law-enforcement&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first, approach, as mentioned in the NYT article, is to limit the use of the
data specifically by law enforcement, specifically by requiring a
separate opt-in for this use (see &lt;a href=&quot;https://lirias.kuleuven.be/retrieve/572076&quot;&gt;Skeva, Laruseau, and
Shabani&lt;/a&gt; for a review of
various company&#39;s practices).  This seems like an understandable first
step by the sites themselves in the face of negative PR, but not
really a long term solution, for several reasons.
First, a blanket policy like this seems like a poor match for
many people&#39;s intuition that law enforcement should have access
in some cases but not others. One might imagine thinking that
the police should be able to do a DNA search for murder but
not jay-walking (as described above, GEDmatch originally
had a policy of only providing law enforcement access for certain
crimes), and perhaps only after they had exhausted other
avenues, but it&#39;s pretty hard to ask people in what particular
cases they want their data to be used to
investigate third parties.&lt;/p&gt;
&lt;p&gt;Second, it&#39;s not clear how much privacy this kind of policy provides; depending on the
legal environment in a given jurisdiction, law enforcement may simply
be able to compel acess, regardless of the sites policies.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Skeva, Laruseau, and Shabani note that many companies
have policies that state that they will comply with valid legal
process. Even if it were the case that law enforcement couldn&#39;t
compel access by these their party sites, increasingly law
enforcement is &lt;a href=&quot;https://www.ojp.gov/pdffiles1/nij/grants/242812.pdf&quot;&gt;gathering its own DNA samples&lt;/a&gt; on arrest, and of course these are not subject to site policies.
More on this below.&lt;/p&gt;
&lt;p&gt;Finally this doesn&#39;t address non law-enforcement applications
like stalking. Even if we assume you can identify law
enforcement users—and what stops them from lying?—the
purpose of this kind of system is to allow ordinary people to look up their
own genealogy, and it&#39;s not like you can tell which ordinary
people are actually stalkers.&lt;/p&gt;
&lt;h3 id=&quot;requiring-validated-samples&quot;&gt;Requiring Validated Samples &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#requiring-validated-samples&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Ehrlich et al. propose a different approach, which is to restrict who
can insert a sample into the system. The way that these systems
typically work is that the user uploads a digital &lt;em&gt;genetic data file
(GDF)&lt;/em&gt; to the genealogy site and can then search based on this file.
At present is no technical mechanism to enforce that this is the uploader&#39;s
&lt;em&gt;own&lt;/em&gt; DNA and so they can just collect DNA, sequence it, and upload
the result. Ehrlich et al. suggest that the sites refuse to accept
sequences that don&#39;t come from DTC testing companies (enforced by
having the DTC company digitally sign the GDF). This would prevent
attacks where you sequenced someone&#39;s DNA and just uploaded the
sequence.&lt;/p&gt;
&lt;p&gt;These policies wouldn&#39;t directly enforce that someone had given
consent for their sample to be uploaded, because the DTC genetics
company doesn&#39;t know whose sample belongs to who.
Presumably the theory would be that it would be hard to surreptitiously
gather a high quality sample from someone and the DTC companies wouldn&#39;t be
able—or would refuse to—analyze the kind of incidental
samples that it was easy to gather from cast-off coffee cups, used
gum, etc, so it would be hard to submit someone else&#39;s sample.
I don&#39;t know how true this is actually is; the tests often use
saliva and I imagine
some kinds of contaminated samples can be analyzed just fine and
some cannot be. Of course, a really sophisticated attacker might
be able to sequence the sample themselves and then synthesize
the relevant regions but we&#39;re probably at least a few years
away from that being the kind of thing that your average person
can do.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
And of course, this requires you to trust that every
DTC company will dutifully enforce policies designed to prevent
third party samples.&lt;/p&gt;
&lt;p&gt;Of course, this still doesn&#39;t address the situation where
law enforcement requires the genealogy site to cooperate,
as they can  present the file in any form that they want.&lt;/p&gt;
&lt;h3 id=&quot;other-attacks&quot;&gt;Other Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#other-attacks&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I&#39;ve focused here almost entirely on identification attacks,
as they are the most obvious way to abuse this kind of system.
However, &lt;a href=&quot;https://dnasec.cs.washington.edu/genetic-genealogy/ney_ndss.pdf&quot;&gt;Ney, Ceze, and Kohno&lt;/a&gt;
have shown that it is possible to use the GEDmatch database
to extract detailed information about people&#39;s DNA. They write:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We were primarily interested in understanding privacy risks to users
that had their kits set to the default “Public” privacy setting on
GEDmatch. This setting provides the most functionality and allows
kits to appear in the results of relative matching queries from
other users (but is not supposed to reveal any raw genetic
information)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;GEDmatch allows you to do a &amp;quot;one-to-one match&amp;quot; in which you
compare your sample to a target&#39;s sample. The result is a
visual comparison, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/kohno-gdf-compare.png&quot; alt=&quot;GEDmatch comparison sample&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Based on this information, they show that it&#39;s possible to extract a
the actual value (in some cases) or an estimate (in others)
of the the target&#39;s genotype for the given SNIP
areas, which is far from ideal. They suggest some countermeasures—including
the signed upload scheme described above—and limiting the
use of the matching APIs.&lt;/p&gt;
&lt;h2 id=&quot;law-enforcement-databases&quot;&gt;Law Enforcement Databases &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#law-enforcement-databases&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Most of the discussion above is about how to restrict access to
consumer databases, but nothing stops law enforcement from just making
their own databases, which is exactly what they are doing. A common
practice is just to take DNA samples from people when they are
arrested.  The report I link to above says there were over 10 million
such profiles in the US in 2013, so presumably there are many more
now.  Such a database seems like it is more effective than a consumer
database in some ways and less in others: It&#39;s more
effective—and hence more of a privacy threat—for
communities with high arrest rates because many people will thus be
sampled. It&#39;s less effective in communities which have low arrest
rates.&lt;/p&gt;
&lt;p&gt;At present, it appears that the consumer databases are superior
just because they are more technically advanced. Historically the federal government has collected only a
limited number (13-20) of markers, rather than the more
detailed data that people now collect. For this reason, they
actually go to consumer genetics sites.
The current US DOJ
&lt;a href=&quot;https://www.justice.gov/olp/page/file/1204386/download&quot;&gt;policy&lt;/a&gt;
on this restricts investigators somewhat, limiting the use to various kinds
of violent crimes (or, in some cases, &amp;quot;attempts to commit
violent crimes&amp;quot;) and requiring that they &amp;quot;must have pursued
reasonable investigative leads to solve the case or to identify the
unidentified human remains.&amp;quot;&lt;/p&gt;
&lt;p&gt;Regardless of the current situation, if law enforcement is
regularly collecting DNA from suspects, it&#39;s only a matter
of time before they have more detailed data from that data
collection—or at least from new data collection.
This data will not be subject to whatever policies consumer
genetics sites have; in particular any technical controls
that are intended
to restrict access to the actual person whose sample it is
will not be effective.&lt;/p&gt;
&lt;h2 id=&quot;what-kind-of-policies-might-we-have%3F&quot;&gt;What kind of policies might we have? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#what-kind-of-policies-might-we-have%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;People with more policy expertise than me have spent real time
on this, so I don&#39;t propose to provide a full analysis on
potential policy responses. However, it seems like there are really two questions here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;How do we prevent abuse of this kind of data for stalking
and disclosure of personal information to the public?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;How do we prevent abuse of this kind of data by law
enforcement?&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first of these questions seems like it potentially may have
a set of technical solutions: restrict the use of the service
to people&#39;s own samples and limit the API so it&#39;s not possible
to learn too much about other people. Neither of these limits
is perfect, but they seem like they probably significantly
increase the cost of attack and it&#39;s probably possible to
add additional defenses over time.&lt;/p&gt;
&lt;p&gt;The law enforcement question is more difficult, in part because
there are going to be strong differences of opinion about
how to balance privacy against law enforcement effectiveness
(and about how much these techniques make law enforcement
more effective). With that said, I suspect that many people
do not want law enforcement to be able to use DNA evidence to identify
anybody for any reason (and potentially to add them to their
database once identified, to make future identification easier);
the DOJ policy, for instance, would not allow this.
This kind of mass surveillance seems like it will eventually be technically possible—if
it isn&#39;t already—so if we don&#39;t want it, we need a policy response.&lt;/p&gt;
&lt;p&gt;From a technical perspective, it seems like there are three main
policy approaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Limit law enforcement&#39;s ability to gather DNA samples.&lt;/em&gt;
In order to use someone&#39;s DNA you have to first get it.
The government can of course compel you to supply a sample
with a warrant, but as noted above, it&#39;s also possible to
just wait around until you discard something that has
your DNA on it. Traditionally trash has been
seen as discarded and therefore fair game, but the wide
availability of DNA analysis technology seems like it changes
the balance.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; One approach, as Alexia Ramirez from the ACLU &lt;a href=&quot;https://www.aclu.org/news/privacy-technology/police-need-a-warrant-to-collect-dna-we-inevitably-leave-behind/&quot;&gt;proposes&lt;/a&gt;,is to require law enforcement to get a warrant to collect your DNA.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Limit the investigative use of DNA data.&lt;/em&gt; Of course, not all
data is collected from identifiable individuals—for instance
it might come from a crime scene—and much of the attention
so far has been instead on limiting the use of DNA data once it&#39;s
collected. For instance, one could have policies like the
&lt;a href=&quot;https://www.justice.gov/olp/page/file/1204386/download&quot;&gt;US DOJ&#39;s&lt;/a&gt;
which only allow DNA searches for certain crimes and after other
avenues have been exhausted. These policies could of course
be made to apply to both consumer and government databases.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Limit the retention of samples.&lt;/em&gt; There has also been quite a bit of
discussion of limiting the government&#39;s ability to collect and
retain DNA evidence from arrestees. Of course, this would still
leave the CG platforms, but would still be a meaningful restriction
in that it (1) makes it harder for make it harder for law enforcement
to surreptitiously violate policies and (2) gives
CG platforms the ability to allow for searches only when
legally compelled—though they may of course choose not
to do so—rather than when legally allowed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As I said above, I don&#39;t propose to provide any kind of complete
policy analysis here. For more, see
&lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/3359260&quot;&gt;Jen King&lt;/a&gt;
and Natalie Ram (&lt;a href=&quot;https://www.virginialawreview.org/wp-content/uploads/2020/12/Ram_Book.pdf&quot;&gt;1&lt;/a&gt; &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3860482&quot;&gt;2&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Stepping back from this particular case, this is just one instance
of a general trend where your privacy is protected not by infeasibility
but rather by inconvenience; it&#39;s always been possible for people to
follow you around and see everything you do&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;,
but it was just too hard to do at scale. Technology changes
that, both by permitting you to see what you previously
couldn&#39;t (DNA, thermal imaging) and by making it much
cheaper to do at scale, either directly or—as here—by crowdsourcing.
Happy goldfish bowl, everyone.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See &lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/3359260&quot;&gt;Jen King&lt;/a&gt;
on people&#39;s perceptions of the implications of submitting
their data. &lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See &lt;a href=&quot;https://www.virginialawreview.org/wp-content/uploads/2020/12/Ram_Book.pdf&quot;&gt;Natalie Ram&lt;/a&gt;
on the legal situation in the US. &lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Update: 2022-01-02
If/when it does become the case that ordinary people can
afford to buy full sequencing equipment—or even when
it&#39;s down to the place where it&#39;s widely available—we&#39;re all going to be in
some serious trouble because it means that anyone
who gets their hands on your used coffee cup will
be able to determine precisely what genetic conditions
you have, as well as anything else we&#39;ve managed
to work out the genetics for. Right now, we can hope
that your average reputable lab won&#39;t take such an obviously
nonconensual sample, but if you can buy a sequencer
for a few hundred thousand dollars, then there
are going to be a lot of disreputable labs. &lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
At least in the US, there is precedent for requiring warrants when technical
capabilities make some kinds of search much more effective,
as in &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Riley_v._California&amp;amp;oldid=1012631588&quot;&gt;Riley v. California (cell phone searches)&lt;/a&gt;
and &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Kyllo_v._United_States&amp;amp;oldid=1048955645&quot;&gt;Kyllo v. US (thermal imaging)&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
See Justice Scalia on &amp;quot;tiny constables&amp;quot; in &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=City_of_Ontario_v._Quon&amp;amp;oldid=1057530997&quot;&gt;City of Ontario v. Quon&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This line is taken from Isaac Asimov&#39;s prescient &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=The_Dead_Past&amp;amp;oldid=1038315686&quot;&gt;The Dead Past&lt;/a&gt;, in which
someone invents a &amp;quot;chronoscope&amp;quot; which can be used to view the past.
It&#39;s not really that useful for historical research because it
can only go back about 150 years, but it&#39;s great for surveillance,
because you can watch 1 second ago. &lt;a href=&quot;https://educatedguesswork.org/posts/dna-genealogy/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>DNS Security, Part III: DANE and the WebPKI</title>
		<link href="https://educatedguesswork.org/posts/dns-security-dane/"/>
		<updated>2021-12-28T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dns-security-dane/</id>
		<content type="html">&lt;p&gt;This is Part III of my series on DNS Security.
(see &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;Part I&lt;/a&gt; for an overview of DNS and its security
issues and &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;Part II&lt;/a&gt; for background on DNSSEC).
In this part, we cover &lt;a href=&quot;https://datatracker.ietf.org/doc/rfc6698/&quot;&gt;&lt;em&gt;DNS Authentication of Named Entities&lt;/em&gt;
(DANE)&lt;/a&gt;, which
uses the DNS to authenticate TLS keys.&lt;/p&gt;
&lt;p&gt;As I mentioned previously, a lot of the reason that DNSSEC hasn&#39;t
seen much deployment is that the information it&#39;s protecting—principally
IP addresses—isn&#39;t usually of very high value; if
you&#39;re really serious about protecting your communications you
encrypt them—probably using TLS, which authenticates the server
via a certificate rather than by IP address. At the same time,
pretty much everyone agrees that the system responsible for
issuing  TLS certificates (the WebPKI) is a mess.
But as my colleague &lt;a href=&quot;https://commerce.net/people/allan-m-schiffman/&quot;&gt;Allan Schiffman&lt;/a&gt;
used to say, sometimes when you have two problems they solve
each other. This brings us to the topic of DANE, which is an
attempt to solve these problems together by using the DNS
to authenticate TLS certificates, thus replacing the
WebPKI&#39;s not-great security properties with the nominally
better DNSSEC ones and simultaneously providing a stronger
use case for DNSSEC deployment.&lt;/p&gt;
&lt;h2 id=&quot;dane%2Ftlsa&quot;&gt;DANE/TLSA &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#dane%2Ftlsa&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;DANE actually attempts to address two distinct (and arguably not that
closely related) complaints about the WebPKI:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;That the large number of CAs in the WebPKI makes it
insecure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;That having to go to a CA to get a certificate
is bad.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To accomplish this, DANE can be used by the domain in two
modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Restrictive&lt;/em&gt;: this allows the server to restrict the set of
valid keys, potentially excluding keys which would otherwise
appear in valid certificates. This is intended to address
the &amp;quot;I don&#39;t trust all the CAs&amp;quot; complaint.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Additive&lt;/em&gt;: this allows the server to cause the client
to accept one or more keys that it would otherwise not
accept (because they aren&#39;t certified by an acceptable
CA). This is intended to address the &amp;quot;I don&#39;t want to get talk to
a CA&amp;quot; complaint, though it also has the side effect of excluding
other keys.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Somewhat confusingly, but understandably from a protocol perspective
-- these are both stored in the same DNS record type TLSA which &amp;quot;does
not stand for anything; it is just the name of the RRtype&amp;quot;, with a
&amp;quot;usage&amp;quot; indicator to differentiate them.  However, the semantics are
very different.  To add to the confusion, DANE has two additive modes
and two restrictive modes, with one of each referring to end-user
certificates and one referring to trust anchors which can sign other
certificates. The result is that discussions about DANE tend to be
fairly hard to follow unless you are able to remember what use model
we are talking about.&lt;/p&gt;
&lt;h3 id=&quot;restrictive-modes&quot;&gt;Restrictive Modes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#restrictive-modes&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The basic idea with a restrictive mode is to contain misissuance.
Because to a first order any CA accepted by the client
can issue a certificate for any domain, the security of your
site is only as strong as the &lt;em&gt;weakest&lt;/em&gt; CA that clients trust
and a mistake by some CA you have never heard of can allow
an attacker to impersonate your site.&lt;/p&gt;
&lt;p&gt;DANE addresses this by allowing the site to publish a list
of either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The CAs that are allowed to issue certificates for the
domain name in question (presumably the list of CAs that the
site operator expects to use).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;(Usage 0)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The certificates that are valid for the domain (Usage 1)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the client connects to the TLS server it compares the certificate
it gets from the server and only accepts that certificate if there is
a match, either for the CA (Usage 0) or of the end-entity certificate
(Usage 1). Importantly, this is a double check: the certificate
still needs to be valid according to the ordinary WebPKI standards;
these modes are just designed to protect against misissuance
but you still need to get a valid certificate.&lt;/p&gt;
&lt;p&gt;Conventional wisdom is that it&#39;s operationally better to use Usage 0
Advertising the end-entity certificate is a bad idea makes it harder to update that
certificate, which you have to do at minimum around once a year&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; and more frequently if you
are using a CA like &lt;a href=&quot;https://letsencrypt.org/&quot;&gt;Let&#39;s Encrypt&lt;/a&gt; which
has shorter lifetimes.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
It can also be also a problem if you have a big server
farm and issue new certificates for each server because
each certificate will be different. If you advertise the CA certificate
(Usage 0), it&#39;s still possible that the
CA will change its certificate—though this is much less frequent—but
if it happens it will cause connections to your site to break.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;additive-modes&quot;&gt;Additive Modes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#additive-modes&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Historically, getting the WebPKI certificate you need to seamlessly host
a TLS server which will be has been kind of a pain.
The advice in these circumstances used to be that you should
just self-sign your certificates and ask users to click through
the resulting warnings, but over the past 5-10 years browsers
have really started to crack down on that with the result
that you now get a big scary warning which people (hopefully) don&#39;t want
to click through:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/fx-bad-cert.png&quot; alt=&quot;Firefox bad cert warning&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/chrome-bad-cert.png&quot; alt=&quot;Chrome bad cert warning&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This is a good thing for security, as it&#39;s very hard for
people to evaluate these warnings and know what&#39;s safe, but made life
much harder for people who didn&#39;t want to get a valid certificate.
DANE tries to address this by allowing you to tell clients that they
should accept your certificate even it can&#39;t be validated
via the WebPKI. As with the restrictive modes,
there are two versions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Usage 2 specifies a certificate authority that will be
used as the trust anchor for the TLS server. This overrides
the existing trust anchor list for this domain, which means
that you can create your own CA and issue yourself
certificates without getting it into a browser root store.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Usage 3 specifies a specific certificate&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
that is expected
to be used by the TLS server. This is conceptually like
Usage 2, except that you don&#39;t need to spin up your own
CA; you can just make a self-signed certificate and bless
it using DANE/TLSA.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that these modes aren&#39;t &lt;em&gt;purely&lt;/em&gt; additive, because they also
restrict the list of certificates to those authorized by the
TLSA records, so they also prevent someone from getting
a WebPKI certificate that is valid for your domain.&lt;/p&gt;
&lt;h2 id=&quot;dane-and-dnssec&quot;&gt;DANE and DNSSEC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#dane-and-dnssec&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;DANE requires DNSSEC (see &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc6698.html#section-4.1&quot;&gt;RFC 6698;
Section
4.1&lt;/a&gt;).  It
should be obvious why the additive modes require it: otherwise an
attacker who controlled the DNS could take over your TLS connections,
thus undercutting the design goal of being secure against active
attackers. The situation with the restrictive modes is somewhat less obvious;
an attacker who controls the DNS can already cause failures by
providing a bogus IP address. This was a topic of some debate
in the DANE WG, but at the end of the day it&#39;s easiest to just
require DNSSEC.&lt;/p&gt;
&lt;p&gt;So what happens if you try to retrieve a TLSA record for &lt;code&gt;example.com&lt;/code&gt;
but that fails (for instance if you can&#39;t validate the DNSSEC signatures,
or the TLSA record never arrives even though the NSEC record says it ought
to)? The only safe thing to do is to refuse to create the TLS connection. The reason for this is that
the valid TLSA record(s)—assuming there is one and there hasn&#39;t
been a misconfiguration—might specify a different certificate from
the one presented by the server; if you can&#39;t retrieve the record,
you need to assume the worst and fail the connection.&lt;/p&gt;
&lt;p&gt;This creates a problem for endpoints which might have unreliable
DNS service, such as browsers.
As I &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#validation-at-the-endpoint-versus-the-recursive&quot;&gt;mentioned&lt;/a&gt;
in Part II, browser and OS vendors have been reluctant to turn on
DNSSEC validation by default because of concerns about spurious
DNSSEC validation failures leading to hard connection failures.
The same concerns apply here and no major browser has added
support for DANE/TLS (See Adam Langley&#39;s 2015 &lt;a href=&quot;https://www.imperialviolet.org/2015/01/17/notdane.html&quot;&gt;Why not DANE in browsers&lt;/a&gt;
for his explanation of why Chrome doesn&#39;t do DANE.)
The situation is somewhat better for endpoints such as mail
servers, which tend to have a clearer path to the Internet,
and DANE is seeing some usage there, as discussed below.&lt;/p&gt;
&lt;h2 id=&quot;dane-tls-extension&quot;&gt;DANE TLS Extension &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#dane-tls-extension&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One way to address the problem of DNSSEC network interference
is to bypass the DNS service entirely. It&#39;s true you need
DNS in order to resolve the IP address of the server, but once
you&#39;ve got that, the server can just give you the
TLSA records—and their supporting DNSSEC signatures—directly?&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Because DNSSEC signs objects, those records are self-contained
and can be verified no matter how they are delivered.
The obvious thing to do here is to just have the
server provide its DNSSEC-authenticated TLSA records in the TLS
handshake along with the server certificate.&lt;/p&gt;
&lt;p&gt;The IETF spent some time developing just such an extension
but was unable to reach consensus on
the precise semantics&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt; and at the end of
the day the whole thing kind of just died out due
to lack of energy, in part because
browsers are where this makes the most difference
and no browser was really interested in the extension.
Eventually, the extension got published as an &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc9102.html&quot;&gt;RFC&lt;/a&gt;
in what&#39;s called the &lt;a href=&quot;https://www.rfc-editor.org/about/independent/&quot;&gt;Independent Stream&lt;/a&gt;
which roughly means that it was published and
has a TLS code point assignment but isn&#39;t any kind of standard.
To the best of my knowledge, few TLS stacks and no browser
supports this extension.&lt;/p&gt;
&lt;h2 id=&quot;dane-deployment-status&quot;&gt;DANE Deployment Status &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#dane-deployment-status&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;When looking at DANE deployment, we should distinguish the situation on the Web
from that for e-mail. As noted above, DANE has essentially no deployment on the Web: no browser
supports it in either the main DNSSEC or the TLS extension mode, and I&#39;m not aware
of any real interest from browsers. I explore the reasons for this below.&lt;/p&gt;
&lt;p&gt;It seems like there is somewhat more interest in DANE on the e-mail
side. Data collected by Viktor Dukhovni and Wes Hardaker at
&lt;a href=&quot;https://stats.dnssec-tools.org/about.html&quot;&gt;DNSSEC-Tools&lt;/a&gt;, indicates
about 17 million DS records and about 3 million DANE protected
domains, indicating that DANE deployment is about 1/6 as high as
DNSSEC deployment, which is already pretty low. Viktor Dukhovni has also posted some more
&lt;a href=&quot;https://mail.sys4.de/pipermail/dane-users/2021-December/000614.html&quot;&gt;details&lt;/a&gt;
of DANE deployment. As you&#39;d expect, most of the deployment is driven
by big hosting providers (most likely because they can ensure that their
DNSSEC records and TLS configurations are in sync).&lt;/p&gt;
&lt;p&gt;Viktor reports:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The number of DANE domains that at some point were listed in Gmail&#39;s
email transparency report is 557 (this is my ad-hoc criterion for a
domain being a large-enough actively used email domain).  Of these, 331
are in recent (last 90 days of) reports (see [2] below my signature).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Google&#39;s most recent e-mail &lt;a href=&quot;https://storage.googleapis.com/transparencyreport/google-safer-email.zip&quot;&gt;transparency
report&lt;/a&gt;
has over 92000 domains. They don&#39;t publish the fraction of email to
and from each domain so it&#39;s a bit hard to be sure, but overall this
seems like a fairly small fraction. In terms of whether the record is
&lt;em&gt;consumed&lt;/em&gt;, the situation is mixed. Microsoft has &lt;a href=&quot;https://techcommunity.microsoft.com/t5/exchange-team-blog/support-of-dane-and-dnssec-in-office-365-exchange-online/ba-p/1275494&quot;&gt;announced&lt;/a&gt;
that they intend to support TLSA and according to Viktor Dukhovni, they will
start &lt;a href=&quot;https://twitter.com/VDukhovni/status/1474904623559286785&quot;&gt;processing it for outbound in 2022&lt;/a&gt;.
Gmail does not and instead uses something called MTA-STS (see below), which just indicates
that the recipient wants you to use TLS. This is not to say that there isn&#39;t
a lot of TLS-encrypted email: Gmail currently
&lt;a href=&quot;https://transparencyreport.google.com/safer-email/overview&quot;&gt;reports&lt;/a&gt; that 81%
of their outgoing email and 89% of their incoming email is encrypted.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/gmail-encrypted.png&quot; alt=&quot;Gmail encryption fraction over time&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As an aside, do you notice the strong seasonality effects
in the &amp;quot;Outbound&amp;quot; direction but not the &amp;quot;Inbound&amp;quot; direction. You can see
this even more strongly if we zoom in to just cover 2020 and 2021.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/gmail-encrypted-2020-22.png&quot; alt=&quot;Gmail encryption fraction 2020-2021&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Obviously, we&#39;d need to test this hypothesis, but I believe what we
are seeing here is a weekday effect based on business addresses
being more likely to use TLS than non-business addresses, and Gmail
doing more sending to business addresses during the week. Layered
on top of that, we have the decreased use of mail for business towards
the end of the year (hence the slump at the end) and then some
COVID effects (perhaps increased use of personal addresses for business
use?) in mid 2020 and mid 2021.&lt;/p&gt;
&lt;h2 id=&quot;why-didn&#39;t-dane-take-off%3F&quot;&gt;Why didn&#39;t DANE take off? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#why-didn&#39;t-dane-take-off%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#the-outlook-for-deployment&quot;&gt;Part II&lt;/a&gt; I said
that DNSSEC deployment was a collective action problem between clients
and servers and the same thing is true here, but between TLS clients
and TLS servers. And as with DNSSEC, the basic problem is that
supporting DANE doesn&#39;t add enough value—and more importantly,
&lt;em&gt;incremental value&lt;/em&gt;—for implementations.
Let&#39;s take the restrictive and additive cases separately.&lt;/p&gt;
&lt;h3 id=&quot;restrictive-modes-2&quot;&gt;Restrictive Modes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#restrictive-modes-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;At first glance, one might think that the restrictive modes
would be a pretty good case for DANE. There were a lot of
concerns—especially at the time DANE was designed—around
certificate misissuance and DANE seemed to offer a solution
to that. In his 2015 post,
&lt;a href=&quot;https://www.imperialviolet.org/2015/01/17/notdane.html&quot;&gt;Adam Langley&lt;/a&gt;
argues that two other technologies are more appropriate here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Certificate_Transparency&amp;amp;oldid=1061877115&quot;&gt;Certificate Transparency&lt;/a&gt;
which helps detect misissuance (and prevent covert misissuance) by forcing certificates to be published.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7469&quot;&gt;HTTP Public Key Pinning (HPKP)&lt;/a&gt;, which
allows Web servers to publish a list of the certificates that were valid and exclude others.
(Langley says he is &amp;quot;lukewarm&amp;quot; on HPKP).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Certificate transparency has been quite successful at detecting CA misbehavior,
but in the 6 years since Langley&#39;s post, HPKP has fallen out of favor, largely
due to concerns about misconfiguration: if you accidentally pin to the wrong
certificate (say your CA changes its certificate) you can make it impossible
for people to reach your site, and because the pins are delivered over the
TLS channel, your site is broken until the pins expire or the browser
makers take pity on you and remotely invalidate your pin. In the past
few years, browser makers have deprecated HPKP.&lt;/p&gt;
&lt;p&gt;In principle, DANE&#39;s restrictive modes do a better job here because
you don&#39;t need TLS to work to fix a broken misconfiguration, but it
comes at the cost of needing to coordinate your Web server certificates and
your DNS, which can be real overhead, especially in cases where your
DNS is served by one entity and your Web site is served by another (or
maybe several others) who don&#39;t cooperate with them. For instance, a
common configuration where your site is hosted on a CDN is to have the
DNS provider point to the CDN (either by a CNAME or just by IP
address), but it doesn&#39;t need to know what certificate the CDN has
(which it may obtain on its own); with DANE you would need to have a
channel to learn the current certificate configuration.&lt;/p&gt;
&lt;p&gt;In addition to management overhead, my sense is that people have gotten
somewhat less concerned about misissuance, in part due to Certificate
Transparency and in part due to some well-publicized examples of
misbehaving CAs being removed from the ecosystem, and the resulting sense that
the WebPKI is being better operated. However, this means
that the restrictive modes aren&#39;t as compelling.&lt;/p&gt;
&lt;p&gt;TLSA does do one more useful thing, which is to indicate that
the client should expect to get TLS with a valid certificate and
fail if it doesn&#39;t. However, at the time that DANE was designed,
the Web already had &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6797&quot;&gt;HSTS&lt;/a&gt;
which did this in HTTP (and thus was easier to deploy).
E-mail recently got something similar in the form
of &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8461&quot;&gt;MTA-STS&lt;/a&gt;,
and this seems to be what many servers such as Gmail are deploying.
MTA-STS even has a DNS mode, but because it doesn&#39;t contain information
about the key, it requires far less coordination between the TLS server
and the DNS.
It seems like an open question whether we&#39;ll end up with MTA-STS, TLSA,
or a mix of both.&lt;/p&gt;
&lt;h3 id=&quot;additive-modes-2&quot;&gt;Additive Modes &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#additive-modes-2&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;By contrast to the restrictive modes, which solved a real
problem—though perhaps not in the way that some people wanted—the
value proposition of the additive modes has always been quite unclear
to me. The basic story seems to be that it&#39;s inconvenient and expensive
to deal with the WebPKI CAs and DANE/TLSA offered a convenient and free
(and incidentally more secure) alternative. Unfortunately, there are
two problems with this story.&lt;/p&gt;
&lt;p&gt;The first problem is that it&#39;s not actually that inconvenient to get a
WebPKI certificate. It&#39;s true that it &lt;em&gt;was&lt;/em&gt; somewhat inconvenient, but
then in late 2015—a little over three years after DANE was
published—&lt;a href=&quot;https://letsencrypt.org/&quot;&gt;Let&#39;s Encrypt&lt;/a&gt; launched a free
automatic certificate authority based on the &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8555.html&quot;&gt;ACME&lt;/a&gt;
protocol. This meant that anyone could get a free WebPKI certificate
that would be acceptable to (almost) every client without any mucking
around with the DNS. Moreover, as described above and in
more detail by &lt;a href=&quot;https://taejoong.github.io/pubs/publications/chung-2017-registrar.pdf&quot;&gt;Chung et al.&lt;/a&gt;,
getting DNSSEC added to your domain and populating it with TLSA records
wasn&#39;t that easy in practice, especially compared to Let&#39;s Encrypt,
which didn&#39;t require any changes to DNS at all.&lt;/p&gt;
&lt;p&gt;The second problem is that DANE didn&#39;t offer any &lt;em&gt;incremental&lt;/em&gt; value.
Even if we assume that DANE/TLSA was easier to manage than WebPKI
certificates and so an all-DANE world would be better than
an all-WebPKI world, the all-DANE world was indefinitely far away.
The problem is that a large fraction of clients wouldn&#39;t have supported DANE
at the time of launch and so you would need a WebPKI certificate in any case until essentially
that entire population upgraded. This can take a &lt;em&gt;really&lt;/em&gt; long time
because the tail of clients who don&#39;t update is very long,
practical matter you are looking at having to support both DANE/TLSA
&lt;em&gt;and&lt;/em&gt; WebPKI more or less indefinitely and this is obviously more
effort than supporting WebPKI, even if you think that DANE alone
would be easier than WebPKI alone.&lt;/p&gt;
&lt;p&gt;It&#39;s useful to look at Let&#39;s Encrypt as a contrast: it&#39;s true that
it was easier to deploy with Let&#39;s Encrypt than certificates from previous WebPKI CAs,
but that wouldn&#39;t have mattered if no client supported Let&#39;s Encrypt&#39;s
certificates. But instead, Let&#39;s Encrypt had a &amp;quot;cross-sign&amp;quot;
from an existing certificate authority that clients already trusted,
which mean that its certificates were immediately valid. This
allowed it to provide incremental value and was &lt;a href=&quot;https://jhalderm.com/pub/papers/letsencrypt-ccs19.pdf&quot;&gt;critical to its success&lt;/a&gt;. In general, it&#39;s extraordinarily hard to deploy new systems which require
every client to change before you get any value and much easier to deploy
systems which give people immediate value from deploying.&lt;/p&gt;
&lt;h2 id=&quot;what-about-dnssec-for-the-webpki%3F&quot;&gt;What about DNSSEC for the WebPKI? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#what-about-dnssec-for-the-webpki%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the most frequent complaints about the WebPKI is that it&#39;s
method of verifying that a given entity should be issued a certificate
is very weak and in fact depends on the DNS. The CA/BF baseline
requirements require that the CA validate that the applicant has
&amp;quot;ownership or control&amp;quot; of the domain. In practice, what this usually
means is that the applicant is able to do one of:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Make specific changes to the Web site,
such as putting a file in &lt;code&gt;/.well-known&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Receive an email to a site administrator, e.g., &lt;code&gt;admin@example.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Making a specific change to the DNS&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of course, all of these involve the CA using the DNS to look up information
about the site which means that it is vulnerable to attacks on the DNS.
And because the site doesn&#39;t yet have a certificate, you can&#39;t use
HTTPS to protect against those attacks as you normally would. In other
words, the security of the WebPKI depends on trusting the security of the CA&#39;s
DNS resolution, the CA&#39;s network, and
the &lt;a href=&quot;https://www.princeton.edu/~pmittal/publications/bgp-tls-usenix18.pdf&quot;&gt;routing infrastructure&lt;/a&gt;.
If any of these are compromised, then the CA can be caused to misissue.&lt;/p&gt;
&lt;p&gt;More recently, we have seen a number of mechanisms
such as &lt;a href=&quot;https://letsencrypt.org/2020/02/19/multi-perspective-validation.html&quot;&gt;multiple&lt;/a&gt;
&lt;a href=&quot;https://blog.cloudflare.com/secure-certificate-issuance/&quot;&gt;perspective&lt;/a&gt;
validation deployed to make DNS- and BGP-based attacks more difficult
and they can potentially be detected via Certificate Transparency.
Still, the situation isn&#39;t ideal.&lt;/p&gt;
&lt;p&gt;It seems like DNSSEC could potentially help, but the situation is
somewhat complicated. In particular, it&#39;s not enough to just have the
CA do DNSSEC verification; even in cases where the domain is signed
and so the IP addresses can be trusted, if the attacker controls the
routing system (or, even worse, the link to the server), then they can
intercept the CA&#39;s connection to the server and fake the response,
so it doesn&#39;t actually help that much to just secure the DNS.
What&#39;s needed to make this work is a way to advertise in the DNS
that the CA should &lt;em&gt;only&lt;/em&gt; use the DNS-based mechanisms for validating
control of the domain name; because these will be protected by
DNSSEC, the CA will no longer be subject to routing-based attacks.
Of course, this also requires quite tight control of the DNS by
the server operator, which makes it less attractive for them
(one reason why the HTTP-based challenges are popular).
&lt;strike&gt;In any case, this would be a simple extension to DNS (potentially an
addition to CAA) but I don&#39;t know how much interest there actually
would be.&lt;/strike&gt;
It turns out such an extension to ACME &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8657&quot;&gt;already exists&lt;/a&gt;, but it does
not &lt;a href=&quot;https://twitter.com/DanielMicay/status/1475973392805376000&quot;&gt;seem to be widely deployed&lt;/a&gt;.
&lt;em&gt;[Update 2021-12-28: Corrected to document the existence of the extension.
Thanks to Daniel Micay for pointing this out.]&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-dns-transport-security&quot;&gt;Next Up: DNS Transport Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#next-up%3A-dns-transport-security&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even if DNSSEC were universally deployed and supported, including validation
by endpoints, it would only be a partial answer to DNS security because it
doesn&#39;t keep the people&#39;s queries secret. Your DNS query history
leaks much of your Internet history, so we know this is sensitive
information and there is already evidence of it being
&lt;a href=&quot;https://www.ftc.gov/system/files/documents/reports/look-what-isps-know-about-you-examining-privacy-practices-six-major-internet-service-providers/p195402_isp_6b_staff_report.pdf&quot;&gt;misused by ISPs&lt;/a&gt; and probably others.
The next post, will cover transport security mechanisms for DNS
such as &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7858&quot;&gt;DNS over TLS (DoT)&lt;/a&gt;,
&lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc8484.html&quot;&gt;DNS over HTTPS (DoH)&lt;/a&gt;,
and &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-dprive-dnsoquic-07.html&quot;&gt;DNS over QUIC (DoQ)&lt;/a&gt;
that are intended to protect those queries, as well as
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-pauly-dprive-oblivious-doh-08&quot;&gt;Oblivious DoH&lt;/a&gt; and
&lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-ohai-ohttp-00.html&quot;&gt;Oblivious HTTP&lt;/a&gt; which
protect the client&#39;s IP address.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is another record called &lt;em&gt;&lt;a href=&quot;https://tools.ietf.org/html/rfc6844&quot;&gt;Certificate Authority Authorization (CAA)&lt;/a&gt;&lt;/em&gt;
which carries similar information but intended for certificate
authorities, telling them that they should not issue
for a given domain name. This is intended to help
prevent misissuance, but is not consumed by the client
and therefore doesn&#39;t do anything once misissuance
has happened. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The CA/Browser Forum &lt;a href=&quot;https://cabforum.org/wp-content/uploads/CA-Browser-Forum-BR-1.8.0.pdf&quot;&gt;Baseline Requirements&lt;/a&gt;
limit certificate lifetimes to 398 days. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;TLSA does allow you to advertise the
public key of the server, and it&#39;s technically possible to
get a new certificate but keep the same key. However, if
you do that, you&#39;re relying on your server software never
to generate a new key, which has its own problems. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Another more subtle failure mode is that the certificate
chain that is constructed for an end-entity certificate
can depend on the browser. If you&#39;re not careful with which
CA certificates you advertise via DANE, you can create
hard to diagnose failure modes. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You could in principle use a &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc7250.html&quot;&gt;raw public key&lt;/a&gt;
but TLS really expects to use certificates, so this is what
DANE specifies and you&#39;re just stuck with some X.509 machinery. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note
that this involves ignoring DNSSEC for the IP address, but as I
pointed out previously, the security impact of this is minimal. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Full disclosure: I was one of the major
participants on one side of the debate, which is really too tedious to
explain. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dane/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>DNS Security, Part II: DNSSEC</title>
		<link href="https://educatedguesswork.org/posts/dns-security-dnssec/"/>
		<updated>2021-12-24T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dns-security-dnssec/</id>
		<content type="html">&lt;p&gt;This is Part II of my series on DNS Security.
(see &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;part I&lt;/a&gt; for an overview of DNS and its security
issues). In this part, we cover Domain Name System Security
Extensions, popularly known as DNSSEC.&lt;/p&gt;
&lt;p&gt;As documented in &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security&quot;&gt;part I&lt;/a&gt;, baseline DNS is tragically
insecure and the DNS community has been working on fixing it
for pretty as long as I&#39;ve been working in Internet security
(the original &lt;a href=&quot;https://tools.ietf.org/rfcmarkup?doc=2065&quot;&gt;RFC&lt;/a&gt;
for DNSSEC was published in 1997, but of course people had been working
on it for quite some time before that). The purpose of DNSSEC
is to provide &lt;em&gt;authenticity&lt;/em&gt; and &lt;em&gt;integrity&lt;/em&gt; for DNS records,
so that when you get a result you know it is correct. It does not
do anything to preserve confidentiality of the query or the results.&lt;/p&gt;
&lt;h2 id=&quot;overview&quot;&gt;Overview &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#overview&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic idea behind DNSSEC is straightforward: digitally
sign the entries in the database (&amp;quot;resource records&amp;quot;).
I mentioned before that names exist in a hierarchy:
In DNSSEC, each node in the hierarchy has a key which is used
to digitally sign all the records at that node.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/dns.png&quot; alt=&quot;DNS hierarchy&quot; /&gt;&lt;/p&gt;
&lt;p&gt;For instance, in the picture above, there would be a single root key which then signs
keys
for &lt;code&gt;com&lt;/code&gt; and &lt;code&gt;org&lt;/code&gt;. Similarly, &lt;code&gt;com&lt;/code&gt; (the parent)
signs the  key for &lt;code&gt;example.com&lt;/code&gt; (the child), which is then used to sign the
records for &lt;code&gt;example.com&lt;/code&gt;. The logic here is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;You know the root key (because it&#39;s been preconfigured in some
way).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You know the key for &lt;code&gt;com&lt;/code&gt; because the root attests to
it by signing it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You know the key for &lt;code&gt;example.com&lt;/code&gt; because &lt;code&gt;com&lt;/code&gt; attests
to it by signing it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can trust the IP address for &lt;code&gt;example.com&lt;/code&gt; because
&lt;code&gt;example.com&lt;/code&gt; signed it with its key.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When a client receives a domain, it verifies it by checking
all the signatures from the root down to the domain and then
checking the signature over the records in the domain.&lt;/p&gt;
&lt;p&gt;Note: the technical terminology here is &amp;quot;zone&amp;quot;, which refers
to a portion of a tree controlled by a single entity.
For instance, &lt;code&gt;com&lt;/code&gt; is one zone and &lt;code&gt;example.com&lt;/code&gt; another,
but &lt;code&gt;www.example.com&lt;/code&gt; might be part of the &lt;code&gt;example.com&lt;/code&gt;
zone if it&#39;s managed by the same people. This is a very important distinction if you&#39;re working
with the DNS, but here I&#39;ll mostly be using &amp;quot;domain&amp;quot; and &amp;quot;zone&amp;quot;
interchangeably.&lt;/p&gt;
&lt;p&gt;There are a number of important properties of this design, as detailed below.&lt;/p&gt;
&lt;h3 id=&quot;dnssec-authenticates-data-not-transactions&quot;&gt;DNSSEC authenticates data not transactions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#dnssec-authenticates-data-not-transactions&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because DNSEC signs objects, those signed objects are self-contained
and it doesn&#39;t matter how you receive them. This means that
as long as you verify the signatures you don&#39;t need to trust the DNS server
you got them from, at least as far as the correctness of the records
goes. It&#39;s even possible to &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9102&quot;&gt;embed DNSSEC-signed chains&lt;/a&gt;
in other protocols, such as TLS. Indeed, one way to think of DNSSEC
is that it&#39;s just end-to-end authenticated data carried over an
insecure transport.&lt;/p&gt;
&lt;p&gt;This also means that it&#39;s possible to have DNSSEC operate
entirely offline, where the records are signed with some key
that is never on a machine connected to the Internet. This is
by contrast to TLS, which requires the key to be available to
the server all the time. This was considered a very important
design criterion at the time, but in practice I&#39;m not sure
that this has turned out to be that great a decision. In particular,
some of the design choices downstream of that requirement have
turned out to be questionable.&lt;/p&gt;
&lt;p&gt;One of these decisions is that it&#39;s hard to keep the
contents of a given zone private, which a lot of enterprises
don&#39;t like.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
The reason for this is that there needs to
be a way to say that an arbitrary name &lt;em&gt;doesn&#39;t&lt;/em&gt; exist, but
you can&#39;t predict all the names in advance. In an online
system like TLS, you would just say &amp;quot;no&amp;quot; to whatever the
question was, but that doesn&#39;t work in an offline system.
DNSSEC handles
this by having a record called NSEC which says
&amp;quot;the next name in alphabetical sequence after name X is name Y&amp;quot;.
The problem is that this can then be used to enumerate all
the names in a domain, just by repeatedly asking &amp;quot;what&#39;s next?&amp;quot; like
a five year old. The DNS community has
spent a lot of effort in trying to design a system that
wouldn&#39;t have this problem, but the strongest current mechanism (NSEC3)
turns out to be
&lt;a href=&quot;https://www.cs.bu.edu/~goldbe/papers/nsec5.html&quot;&gt;not that effective&lt;/a&gt;,
for technical reasons outside the scope of this post.&lt;/p&gt;
&lt;h3 id=&quot;limited-protection-against-censorship&quot;&gt;Limited Protection Against Censorship &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#limited-protection-against-censorship&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;While DNSSEC provides protection against someone inserting
false data into your DNS resolution, it does not protect against
someone who just wants to stop you from accessing a given
site because that attacker can just suppress the DNS
response or alternately inject a bogus response of their
own. The signature won&#39;t verify of course, but it doesn&#39;t matter
because you still don&#39;t have the answer. As a practical
matter, what happens is that the resolver reports an error,
and you know things have failed, but it doesn&#39;t
really matter because you still can&#39;t get where you are trying
to go.&lt;/p&gt;
&lt;h3 id=&quot;cryptographic-algorithms&quot;&gt;Cryptographic Algorithms &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#cryptographic-algorithms&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One property of DNSSEC that has gotten a lot of negative
attention--for instance by Ptacek and Langley a few years ago--is its use of weak
cryptography. When DNSSEC was first designed, it used RSA with 1024
bit keys for signatures. This is no longer considered secure--the
minimum key length for the WebPKI has been &lt;a href=&quot;https://news.netcraft.com/archives/2012/09/10/minimum-rsa-public-key-lengths-guidelines-or-rules.html&quot;&gt;2048 bits since 2012&lt;/a&gt;--
and the system has very gradually been moving towards using longer RSA
key lengths (2048- or at least 1024-bit) or more modern elliptic
curve signatures. It appears that most domains now have
&lt;a href=&quot;https://www.co.tt/dnssec_scan_val.html&quot;&gt;2048-bit keys&lt;/a&gt;
as well
as 1024-bit keys, which I suspect is a backward compatibility
mechanism,
as well as some elliptic curve keys.&lt;/p&gt;
&lt;p&gt;In general, backward compatibility is a big challenge
for an object security protocol like
DNSSEC. The issue is that you need whatever data you produce
to be readable by everyone--in contrast to an interactive protocol
like TLS, where you can negotiate what to do and detect
when things break--so for instance,
if you introduce a new algorithm it has to be done in such
a way that it doesn&#39;t break old verifiers. This means that
you have to (1) have the presence of that algorithm/signature
not be a problem and (2) provide signatures with the old algorithm
until those old verifiers have upgraded to support the new algorithm
or until you no longer care about them--which may be more or
less indefinitely.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This turns out to be especially difficult for the root keys,
which are preconfigured into every resolver and which required
a quite ornate &lt;a href=&quot;https://www.apnic.net/manage-ip/apnic-services/dnssec/keyroll&quot;&gt;procedure&lt;/a&gt;
to roll over back in 2018, including the follow dire warnings:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Once the new keys have been generated, network operators performing DNSSEC validation will need to update their systems with the new key so that when a user attempts to visit a website, it can validate it against the new KSK.&lt;/p&gt;
&lt;p&gt;Maintaining an up-to-date KSK is essential to ensuring DNSSEC-validating DNS resolvers continue to function following the rollover.&lt;/p&gt;
&lt;p&gt;Failure to have the current root zone KSK will mean that DNSSEC-validating DNS resolvers will be unable to resolve any DNS queries.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In the event this seems to have gone &lt;a href=&quot;https://taejoong.github.io/pubs/publications/muller-2019-ksk.pdf&quot;&gt;relatively smoothly&lt;/a&gt;,
albeit after quite a bit of effort and a delay of a year.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;limited-trust&quot;&gt;Limited Trust &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#limited-trust&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;DNSSEC has a much stricter trust hierarchy than the WebPKI certificate
system: in the WebPKI, pretty much any CA&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
can sign any domain name. This means that even if you are &lt;code&gt;example.com&lt;/code&gt; and
have all of  your certificates from &lt;a href=&quot;https://letsencrypt.org/&quot;&gt;Let&#39;s Encrypt&lt;/a&gt;, an attacker
can compromise another certificate authority and get it to issue a certificate
for &lt;code&gt;example.com&lt;/code&gt;. The WebPKI has had several mechanisms bolted on after
the fact to control this kind of attack&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
but this is generally recognized as kind of a misfeature.&lt;/p&gt;
&lt;p&gt;By contrast, in DNSSEC, only the entity responsible
for &lt;code&gt;.com&lt;/code&gt; can sign the keys for &lt;code&gt;example.com&lt;/code&gt;
which means that the attacker needs to compromise either that entity
or one of the places it gets its data sources (e.g., a domain name registrar),
which is a much smaller set than the 100 &lt;em&gt;[Updated: 2021-12-24. This originally
said 1000. Thanks for Phillip Hallam-Baker and Ryan Hurst for the
correction.]&lt;/em&gt;
or so WebPKI certificate authorities&lt;/p&gt;
&lt;p&gt;accepted by a major browser. This also has some issues in terms of deployment,
as we&#39;ll see later, but from a security perspective it&#39;s a win.&lt;/p&gt;
&lt;p&gt;At this point it&#39;s natural to think that you might replace the WebPKI
with DNSSEC signed keys, and in fact &lt;a href=&quot;https://datatracker.ietf.org/doc/rfc6698/&quot;&gt;&lt;em&gt;DNS Authentication of Named Entities&lt;/em&gt;
(DANE)&lt;/a&gt; attempts to do
precisely this. I plan to cover this in a future post.&lt;/p&gt;
&lt;h3 id=&quot;dnssec-does-not-provide-privacy&quot;&gt;DNSSEC Does not provide privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#dnssec-does-not-provide-privacy&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;DNSSEC does not do anything to provide privacy. It&#39;s the same old DNS
as before, just with signed records, so the privacy properties are just
as bad as before.&lt;/p&gt;
&lt;h2 id=&quot;incremental-deployment&quot;&gt;Incremental Deployment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#incremental-deployment&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Because DNSSEC is a retrofit, it had to be added in a backward
compatible way. As a practical matter, that meant an incremental
rollout in which some data was signed and some was not. The problem
here is setting the client&#39;s expectations correctly: suppose that
you try to resolve &lt;code&gt;example.com&lt;/code&gt; and you get an unsigned result.
Does this mean that &lt;code&gt;example.com&lt;/code&gt; is really unsigned or that it
actually &lt;em&gt;is&lt;/em&gt; signed but you&#39;re under attack by someone who wants
you to think it&#39;s unsigned (obviously, they can&#39;t send you a valid
signed record)? On the
Web this is handled by having two different URL schemes, &lt;code&gt;http:&lt;/code&gt;
and &lt;code&gt;https:&lt;/code&gt;, with &lt;code&gt;https:&lt;/code&gt; telling the client to expect
encryption and to fail if it doesn&#39;t get it, but that doesn&#39;t
work with DNS where the names are the same.&lt;/p&gt;
&lt;p&gt;The solution is to have an indication in the parent zone about the
status of the child. Specifically, when you ask a parent which is
using DNSSEC for information about the child, then one of two things
happens:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If the child is using DNSSEC, the parent sends a
&lt;em&gt;delegation signer&lt;/em&gt; (DS) record to provide a hash of the child&#39;s
key.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the child is not using DNSSEC, the parent response with
a &lt;em&gt;next secure&lt;/em&gt; (NSEC) record which indicates that the child
is not using DNSSEC.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The figure below shows this what this looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/dnssec-delegation.png&quot; alt=&quot;A partially DNSSEC signed tree&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this figure, the shaded domains are DNSSEC signed, and the empty ones are
unshaded ones are unsigned. Every time a child node is signed, the
parent has a DS record attached indicating the key.
&lt;em&gt;[Update 2021-12-24: Updated the figure to show &lt;code&gt;isoc.org&lt;/code&gt;, which really does
have a DS record. Thanks to Thomas Ptacek for pointing this out.]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In order for this to work properly, both the NSEC and DS records
have to be signed. The DS record has to be signed because otherwise
you can&#39;t trust the key; the NSEC record has to be signed because
otherwise you can&#39;t trust the claim that the child is unsigned
(this is called &amp;quot;authenticated denial of existence&amp;quot;). This also
means that in order for a child to be secured with DNSSEC, its
parent needs to be secured, which means its parent needs to be secured,
and so on all the way to the root. The result is that DNSSEC mostly has
to be deployed top down, with the root being signed first and then
the top level domains, etc.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Putting this all together, when a client goes to resolve a name, it
gets one of three results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The result is validly signed all the way back to the root and
therefore is trustworthy (&lt;a href=&quot;https://datatracker.ietf.org/doc/rfc4033/&quot;&gt;RFC 4033&lt;/a&gt;
calls this &amp;quot;Secure&amp;quot;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The result is supposed to be signed but actually can&#39;t be verified
for some reason such as an invalid signature, broken keys,
missing signatures, etc. (&amp;quot;Bogus&amp;quot;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The result is not supposed to be signed (and presumably
isn&#39;t) (&amp;quot;Insecure&amp;quot;)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As with any cryptographic system, it&#39;s not really possible to distinguish
between error (misconfiguration, network damage, etc.) and attack, but as
a practical matter any Bogus result has to be treated as if it were an
attack and the client has to generate some kind of error, whether
it&#39;s just failing with an error (&amp;quot;hard fail&amp;quot;) or warning the user
with an option to override (&amp;quot;soft fail). In either case it&#39;s not OK to just
silently accept the result and move on because then an attacker can just substitute
their own data with an invalid signature and trick you into accepting it.&lt;/p&gt;
&lt;h2 id=&quot;dnssec-deployment&quot;&gt;DNSSEC Deployment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#dnssec-deployment&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So how much DNSSEC deployment is there? There are a number of ways of
looking at this question.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;How many top-level domains (&lt;code&gt;.org&lt;/code&gt;, &lt;code&gt;.com&lt;/code&gt;, etc.) are signed?&lt;/li&gt;
&lt;li&gt;How many second-level domains (&lt;code&gt;example.org&lt;/code&gt;, etc.) are signed?&lt;/li&gt;
&lt;li&gt;What fraction of resolutions verify DNSSEC signatures?&lt;/li&gt;
&lt;li&gt;How widely do end-user clients (typically stub resolvers) verify DNSSEC signatures?&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;top-level-domains-(tlds)&quot;&gt;Top-Level Domains (TLDs) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#top-level-domains-(tlds)&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I noted above, as a practical matter having a domain be DNSSEC
signed requires that the parent domains be signed all the way to
the root. Conversely, this requires that the root be signed and that
most or all of the TLDs be signed as well, otherwise nobody can
get their domains signed. It took a while, but this part is
actually going pretty well. The root has been signed for sometime
and as shown in ICANN&#39;s most recent
&lt;a href=&quot;http://stats.research.icann.org/dns/tld_report/&quot;&gt;data&lt;/a&gt;, the
vast majority of TLDs (1372 out of 1489) are now signed,
and this includes all the major ones like &lt;code&gt;.org&lt;/code&gt;,
&lt;code&gt;.net&lt;/code&gt;,
&lt;code&gt;.com&lt;/code&gt;, and &lt;code&gt;.io&lt;/code&gt; as well as some perhaps less
exciting TLDs such as &lt;code&gt;.lawyer&lt;/code&gt;, &lt;code&gt;.wtf&lt;/code&gt;, and &lt;code&gt;.ninja&lt;/code&gt; (yes, seriously).
You&#39;re probably not going to be too sad to learn that &lt;code&gt;.np&lt;/code&gt; (Nepal)
is not signed, though I guess they should get on it.&lt;/p&gt;
&lt;h3 id=&quot;registered-domains&quot;&gt;Registered Domains &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#registered-domains&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You will sometimes hear that &amp;quot;TLD X is signed&amp;quot; but that just means that
the records &lt;em&gt;pointing&lt;/em&gt; to its children are signed, not that the
actual records &lt;em&gt;in its children&lt;/em&gt; are signed. As described above,
this just lets you determine whether the children have deployed
DNSSEC and if so what their key is, but it doesn&#39;t automatically
make those domains secure. This is necessary for incremental
deployment but makes the situation a little confusing.&lt;/p&gt;
&lt;p&gt;However, you&#39;re not going to be serving your website out of
&lt;code&gt;.ninja&lt;/code&gt; (even if you&#39;re an actual ninja, though you
can have &lt;code&gt;ninja.wtf&lt;/code&gt;), so what&#39;s more important for
security is how many of the second level and lower domains that you
can actually register sign their contents.
Complete data doesn&#39;t
seem to be available here, in part because the operators
of the TLDs don&#39;t make their contents publicly available;
you can access individual records but not just ask for all of them.
For instance, if you want to have access to the contents&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;
of &lt;code&gt;.com&lt;/code&gt; and &lt;code&gt;.net&lt;/code&gt; you need to contact Verisign.
&lt;a href=&quot;https://taejoong.github.io/&quot;&gt;Taejoong Chung&lt;/a&gt; has a good
&lt;a href=&quot;https://securepki.org/imc17.html&quot;&gt;rundown&lt;/a&gt; of the
sources for their &lt;a href=&quot;https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-chung.pdf&quot;&gt;2017 paper&lt;/a&gt;
This is remarkably hard to get a straight answer to, for several
reasons. First, domain name operators are often unwilling to publish
complete data or their domains so we have only partial data. Second,
what often seems to get counted is DS records rather than signed domains, and there can
be more than one DS record per domain. With that said,
we do have a few sources of data
here (mostly gathered from the Internet Society &lt;a href=&quot;https://www.internetsociety.org/deploy360/dnssec/statistics/&quot;&gt;DNSSEC statistics site&lt;/a&gt;,
but they all tell the same basic story, which is of
a fairly low level of deployment.
For example, data collected by
Viktor Dukhovni and Wes Hardaker at &lt;a href=&quot;https://stats.dnssec-tools.org/about.html&quot;&gt;DNSSEC-Tools&lt;/a&gt;,
has around 17 million DS records (indicating DNSSEC signed domains), while
Verisign reports there are around
around &lt;a href=&quot;https://www.verisign.com/en_US/domain-names/dnib/index.xhtml&quot;&gt;360 million&lt;/a&gt; registrations,
so this gives us below 5% deployment, &lt;strike&gt;though this may be a slight overestimate due to multiple DS records.&lt;/strike&gt; &lt;em&gt;[Update 2022-01-22: Viktor Dukhovni informs me that they are counting RRSets and not records, so these numbers are exact.]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/statdns.png&quot; alt=&quot;DNSSEC deployment by domain size&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Digging deeper, the figure above shows the number of signed domains by TLD size
(data from &lt;a href=&quot;https://www.statdns.com/&quot;&gt;StatDNS&lt;/a&gt;).
As above, the level
of deployment is quite low across the board, with all the big domains under
4% (&lt;code&gt;.com&lt;/code&gt; is at 2.8%). The biggest domain with significant deployment
is &lt;code&gt;.ch&lt;/code&gt; (Switzerland) at just below 1/3 deployment
with about 2.3 million domains) and the only substantial sized
domain with even over 50% deployment is &lt;code&gt;.se&lt;/code&gt;, at 53%.
I&#39;m not sure precisely why
the fraction is so high for &lt;code&gt;.ch&lt;/code&gt; and &lt;code&gt;.nu&lt;/code&gt; but
the large fraction in &lt;code&gt;.se&lt;/code&gt; is probably due to financial
incentives to enroll people, as reported by &lt;a href=&quot;https://taejoong.github.io/pubs/publications/chung-2017-registrar.pdf&quot;&gt;Chung et al.&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There is some evidence that the situation is improving slightly. Here&#39;s Verisign&#39;s
data for DNSSEC deployment in &lt;code&gt;.com&lt;/code&gt; and &lt;code&gt;.net&lt;/code&gt; (which they
operate):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://www.verisign.com/en_US/resources/img/percent.png&quot; alt=&quot;.COM and .NET DNSSEC data&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As you can see, there was a big bump in 2020 which seems to be flattening in 2022.
However, even that bump corresponds to about 1 percentage point a year, so unless
things really accelerate, we&#39;re looking at quite some time before the
majority of domains are DNSSEC signed.&lt;/p&gt;
&lt;h3 id=&quot;deployment-of-dnssec-validation&quot;&gt;Deployment of DNSSEC Validation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#deployment-of-dnssec-validation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Having a domain DNSSEC-signed doesn&#39;t do any good if nobody
checks, so how often are validations checked? The best data
here comes from APNIC, which reports a validation rate of
about a &lt;a href=&quot;https://stats.labs.apnic.net/dnssec/XA?hc=XA&amp;amp;hx=0&amp;amp;hv=1&amp;amp;hp=1&amp;amp;hr=1&amp;amp;w=1&amp;amp;p=0&quot;&gt;little under 30%&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/dnssec-validation-apnic2.png&quot; alt=&quot;APNIC DNSSEC Validation Rate&quot; /&gt;.&lt;/p&gt;
&lt;p&gt;Note: I&#39;m not sure what &amp;quot;partial&amp;quot; validation means, so apparently
about 40% of the survey has some validation. A lot of this seems
to be &lt;a href=&quot;https://blog.apnic.net/2019/03/14/the-state-of-dnssec-validation/&quot;&gt;driven&lt;/a&gt;
by the use of public recursive resolvers like Google Public
DNS which do DNSSEC validation.&lt;/p&gt;
&lt;h3 id=&quot;deployment-of-endpoint-dnssec-validation&quot;&gt;Deployment of Endpoint DNSSEC Validation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#deployment-of-endpoint-dnssec-validation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Although there is a significant amount of DNSSEC validation, to the
best of my knowledge the vast majority of it is in recursive resolvers.
Although most operating systems have some built-in DNSSEC validation
capability, at least Mac and Windows don&#39;t do it by default.
Similarly, even Web browsers which have their own
resolvers--like Chrome--don&#39;t do DNSSEC validation.
I understand that some mail servers do it for DANE keys, but I don&#39;t
have any measurements of the scale.&lt;/p&gt;
&lt;h2 id=&quot;validation-at-the-endpoint-versus-the-recursive&quot;&gt;Validation at the Endpoint versus the Recursive &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#validation-at-the-endpoint-versus-the-recursive&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted above, nearly all validation happens at the recursive
resolver rather than at the endpoint. This does provide some
security value in that it protects against most attacks that
are &lt;em&gt;upstream&lt;/em&gt; of the recursive resolver, whether they
are on-path or off-path. However, it doesn&#39;t prevent two
very important classes of attack:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Attacks between you and the recursive resolver. For instance,
an attacker on the same network can inject their own
responses. There are lots of situations where people are
on untrusted networks--consider that their are probably
a lot of network links between you and your ISP&#39;s resolver--so
this is a real concern, especially if your link to that
resolver is not cryptographically protected, which is true
of classic DNS, though not of some of the new ere emerging
protocols such as DNS over HTTPS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Attacks &lt;em&gt;by&lt;/em&gt; the recursive resolver. The recursive resolver
can basically do anything it wants. If you&#39;re on a malicious
network then it can simply respond to every query with its
own response. They can also invisibly censor you just
by removing responses. Moreover,
there are plenty of
cases where we know this happens already that people tend
not to think of as malicious, for instance blocking at
schools or redirecting you to a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Captive_portal&amp;amp;oldid=1058617883&quot;&gt;captive portal&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Obviously, if you validated the DNSSEC records yourself neither
of these would work (as noted above, it would still be possible for the
recursive resolver to censor specific domains, but it
couldn&#39;t do so &lt;em&gt;invisibly&lt;/em&gt;), but if you don&#39;t, then you&#39;re just trusting
the recursive resolver to validate things and so you can&#39;t
detect these forms of attack, and it&#39;s not safe to use the DNS
for anything which relies on integrity (see &lt;a href=&quot;https://sockpuppet.org/blog/2015/01/15/against-dnssec/&quot;&gt;Ptacek&lt;/a&gt;
on this as well). This is sort of an odd position to be in
because as a general matter you&#39;re just trusting some element
on the network that you&#39;ve never heard of. Even if you trust
the network provided by your ISP, why would you trust the
one provided by your airport or coffee shop?&lt;/p&gt;
&lt;p&gt;So, why don&#39;t endpoints validate? There are two main reasons. First,
there are concerns about breakage. Right now, endpoints can resolve
any domain regardless of DNSSEC status, but if they turn on DNSSEC
validation, they will inevitably start to experience failures for some
domains. To the extent to which these failures are due to actual
attack, this is a good thing, but if many of them are due to other
&amp;quot;innocuous&amp;quot; issues (e.g., misconfiguration) then that leads to a bad
user experience because users suddenly will be unable to reach sites
which they previously were able to reach. This then gets blamed not on
the actual culprit but on the entity who made the change that resulted
in breakage (what I&#39;ve heard Adam Langley call the &amp;quot;Iron law of the
Internet&amp;quot;, namely that the last person to touch anything gets
blamed), in this case whoever turned on endpoint validation.
For this reason, vendors of end-user software such as
operating systems or browsers are very conservative about making
changes which might break something.&lt;/p&gt;
&lt;p&gt;There are two primary potential sources of breakage for DNSSEC
resolution (1) misconfiguration of the domain (e.g., a broken
signature and (2) network interference. The good news is that
there is now enough recursive side validation that we have
probably gotten the level of misconfiguration down to tolerable
levels. The data I have seen suggests small amounts, but probably
of less popular domains. This leaves us with interference.
Obviously some interference is due to actual attack or other deliberate DNS manipulation
spoofing results for captive portal detection, but some of it
is due to various kinds of network problems, such as intermediaries
of various kinds who don&#39;t properly forward or filter out
unknown DNS record types such as the ones needed to make DNSSEC
work.
Note that these issues are not as severe for large recursive
resolvers, which generally have a clear path to the Internet
and so don&#39;t need to worry about intermediaries breaking them.
Data is pretty thin on the ground here, with the last published
information from at least as old as 2015, where Adam Langley
&lt;a href=&quot;https://www.imperialviolet.org/2015/01/17/notdane.html&quot;&gt;reported&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Some years ago now, Chrome did an experiment where we would lookup
a TXT record that we knew existed when we knew the Internet
connection was working. At the time, some 4–5% of users couldn&#39;t
lookup that record; we assume because the network wasn&#39;t
transparent to non-standard DNS resource types.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was a long time ago and I and others have actually been trying to gather some
more recent data, but it&#39;s not encouraging and even a very small
increased failure rate (significantly below 1%) is enough to be
problematic, because effectively you&#39;ll be breaking that fraction of
your users.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This is especially true when--as is the case here--the user value is
not particularly high (the second main reason client&#39;s don&#39;t
validate). From the perspective of the client--especially
something like a browser or an OS--the most important information it
is getting from the DNS is the IP address corresponding to the domain
it is trying to reach and this information turns out not to be
that security critical. First, if you are using an encrypted protocol like
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=HTTPS&amp;amp;oldid=1061087458&quot;&gt;HTTPS&lt;/a&gt;--which
&lt;a href=&quot;https://letsencrypt.org/stats/#percent-pageloads&quot;&gt;something like 80% of Web page loads&lt;/a&gt;
are, then even if an attacker manages to change DNS to point
you to the wrong server, they will not be able to impersonate
the right server.
Of course, even if you are using encryption, an attacker might be able
to interfere with your DNS to redirect your traffic as part of
a DoS attack on some other server (as noted above, they can
also mount a DoS attack on you, DNSSEC or otherwise), but this
isn&#39;t anywhere as near as bad as intercepting your traffic.&lt;/p&gt;
&lt;p&gt;Second, even in cases where you aren&#39;t using encryption or you
are using some kind of opportunistic encryption. It&#39;s not clear
how valuable having the right IP is.
As a practical matter, if an attacker is able to interfere
with DNS traffic between you and the resolver--or they are the
resolver--then it is quite likely that they can also attack
your application traffic directly, which means that they can
divert your traffic to their server even if you &lt;em&gt;do&lt;/em&gt; get
the correct IP address through DNS, so DNSSEC doesn&#39;t
help much here either.&lt;/p&gt;
&lt;h2 id=&quot;the-outlook-for-deployment&quot;&gt;The outlook for deployment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#the-outlook-for-deployment&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the end of the day, DNSSEC deployment is a collective action
problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Because a relatively small number of domains are signed and the data
that isn&#39;t that important, resolvers--especially clients--have a
relatively low level of incentive to deploy DNSSEC validation,
especially when stacked up against the potential cost of high levels
of breakage for users.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because not that many resolutions are validated and so few
clients validate, the incentive for domain operators to sign their
domains is relatively low. Not only does it come at a nontrivial risk
of breakage if things are misconfigured, there are a number of
additional operational costs. (Chung et al. discuss a number of these
in a 2017 &lt;a href=&quot;https://taejoong.github.io/pubs/publications/chung-2017-registrar.pdf&quot;&gt;paper&lt;/a&gt;
which focuses on the low level of support for DNSSEC by
registrars, who are actually responsible for registering domains.)
Moreover, the domain doesn&#39;t get a lot of benefit from being
signed: if it wants real security it has to mandate HTTPS or the
like anyway.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Moreover these two reasons interlock: as long as one side
doesn&#39;t move the other side has a low incentive to move either
and we&#39;re stuck in a low deployment equilibrium.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-dane&quot;&gt;Next Up: DANE &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#next-up%3A-dane&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Because DNSSEC deployment is so low on both client and server, it&#39;s also impractical to design new
features which depend on DNSSEC. For example, it would be
nice to have a system which allowed domains to advertise keys
to be used for non-TLS transactions (e.g., to sign
&lt;a href=&quot;https://educatedguesswork.org/tags/vaccine%20passports/&quot;&gt;vaccine passports&lt;/a&gt;). This is something
you could do with the DNS, but obviously it needs to be secure
and asking everyone to install DNSSEC would be impractical,
so instead we get hacks like having the key served in a specific
location on an HTTPS secured site. There are quite a few
applications which would be much easier if we have DNSSEC
but are not individually enough to motivate DNSSEC deployment and instead
get done in less elegant ways that don&#39;t require collective
action.&lt;/p&gt;
&lt;p&gt;Next up, I&#39;ll talk about probably the most serious attempt to
add such an application to the DNS:
&lt;a href=&quot;https://datatracker.ietf.org/doc/rfc6698/&quot;&gt;&lt;em&gt;DNS Authentication of Named Entities&lt;/em&gt;
(DANE)&lt;/a&gt; which
uses the DNS to advertise TLS keys.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Thomas Ptacek goes into this in
some detail in his post &lt;a href=&quot;https://sockpuppet.org/blog/2015/01/15/against-dnssec/&quot;&gt;Against DNSSEC&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s not exactly clear to me what the security properties
of this are. In principle, if you have two keys which should be used to sign
the records then you can make the signature as strong as
the strongest one, not the weakest one, but it&#39;s somewhat
subtle to get this right. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It appears that this is another backward
compatibility issue in that not all of the existing resolvers
supported automatically updating to the new keys, and so you
couldn&#39;t be confident that they would be universally accepted. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s possible to have what&#39;s called a &amp;quot;technically constrained&amp;quot; CA
which can only sign specific domains, but many CAs are not so
constrained, as it makes them much less useful. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The &lt;a href=&quot;https://letsencrypt.org/docs/caa/&quot;&gt;CAA Record&lt;/a&gt;
which tells CAs not to issue certificates unless they are listed in the
record and
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Certificate_Transparency&amp;amp;oldid=1057432834&quot;&gt;Certificate Transparency&lt;/a&gt;,
which publishes all existing certificates so that it&#39;s possible to detect misissuance. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;As I understand it, this key is just used to sign the keys
that the child uses to sign the domain, but we can ignore this here. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Technically, it indicates that there
is no DS record for the child, which tells the client that
the child has no key and is not using DNSSEC. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s possible to have little islands that are signed, but then
you need some way to disseminate their keys, which undercuts
the whole system. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The RFCs also include an &amp;quot;indeterminate&amp;quot; state, but this seems
to be basically the same as &amp;quot;insecure&amp;quot; &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s actually not quite clear to me why they don&#39;t
just publish this data; I suspect it&#39;s
viewed as somehow proprietary. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Sometimes people will propose that clients try to probe and
see if the local network passes DNSSEC records correctly and
only validate if so, but that just lets the local attacker disable
validation by tampering with the probe. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security-dnssec/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>DNS Security, Part I: Basic DNS</title>
		<link href="https://educatedguesswork.org/posts/dns-security/"/>
		<updated>2021-12-19T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/dns-security/</id>
		<content type="html">&lt;p&gt;Over the past few years, the topic of the security of several Web browsers, including Firefox,
Chrome, and Safari, have been rolling out &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8484&quot;&gt;DNS over HTTPS (DoH)&lt;/a&gt;,
which as brought the question of DNS security to the forefront, but also
resulted in (or just revealed?) a lot of confusion about DNS security.
This post is the first in a series on that topic, covering the basics of DNS and some of
the security properties. Future posts will cover DNSSEC, DoH, etc.&lt;/p&gt;
&lt;h2 id=&quot;what-is-dns%3F&quot;&gt;What is DNS? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security/#what-is-dns%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic unit of addressing for devices on the Internet is the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=IP_address&amp;amp;oldid=1055212362&quot;&gt;IP (Internet
Protocol) Address&lt;/a&gt;,
which is just a large number (32 bits for IP version 4 and 128 bits
for IP version 6). It&#39;s conventional to write IPv4 addresses like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   192.0.2.1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And IPv6 addresses like so.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   2001:0db8:0000:0000:0000:8a2e:0370:7334
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For obvious reasons, people don&#39;t want to memorize these addresses and
instead want to use names, such as &lt;code&gt;example.com&lt;/code&gt;. The &lt;em&gt;Domain
Name System (DNS)&lt;/em&gt; is responsible for mapping these names (&lt;em&gt;domain
names&lt;/em&gt;, hence &amp;quot;Domain Name System&amp;quot;) onto addresses. This lets you type
&lt;code&gt;https://www.example.com/&lt;/code&gt; into your browser, with the computer
then figuring out the actual IP address and connecting to it.
The DNS can also serve other kinds of information than IP addresses,
such as &lt;code&gt;MX&lt;/code&gt; records, which say where to find a mail server
for a given domain (this is what allows me to have the mail and
Web service) or &lt;code&gt;TXT&lt;/code&gt; records, which contain freeform text.&lt;/p&gt;
&lt;h2 id=&quot;how-does-it-work%3F&quot;&gt;How does it work? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security/#how-does-it-work%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;DNS names consist of a series of names (&amp;quot;labels&amp;quot;) separated by a period
(conventionally called a &amp;quot;dot&amp;quot;). This is arranged in a hierarchy so
that (for example) &lt;code&gt;example.com&lt;/code&gt; is &amp;quot;owned&amp;quot; by &lt;code&gt;.com&lt;/code&gt;.
Conceptually, you organize the names in
a tree, with the name being read right to left and the tree organized
from top to bottom. Thus &lt;code&gt;example.com&lt;/code&gt; is the node at the lower
left of the tree:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/dns.png&quot; alt=&quot;DNS tree&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Every node on the tree can have data associated with it, so
for instance, &lt;code&gt;example.org&lt;/code&gt; could have IP address &lt;code&gt;192.0.2.1&lt;/code&gt;
and &lt;code&gt;www.example.org&lt;/code&gt; could have IP address &lt;code&gt;192.0.2.2&lt;/code&gt;.
This is a familiar computer science data structure and
as you might expect if you are used to working with trees,
you look up data in the tree (the jargon here is
&amp;quot;resolving&amp;quot;) by starting at the top of the tree and working
your way downwards.&lt;/p&gt;
&lt;h3 id=&quot;the-resolution-process&quot;&gt;The Resolution Process &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security/#the-resolution-process&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The figure below shows the process of
resolving &lt;code&gt;example.org&lt;/code&gt;. (I know that
this is complicated, but don&#39;t worry I&#39;ll walk through it.)&lt;/p&gt;
&lt;img style=&quot;width: 80%;&quot; src=&quot;https://educatedguesswork.org/img/dns-resolve.png&quot; alt=&quot;DNS Resolution&quot; /&gt;
&lt;p&gt;The general structure here is what&#39;s called a &amp;quot;request/response&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
protocol: the client sends a request to a server and gets a response.
There are three request/response pairs, each to a different server.
I go through each message below.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The client starts by sending a request to the root server and
asks it who is responsible for the domain name &lt;code&gt;org.&lt;/code&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Root servers are special servers which know about all the
single-label names (&amp;quot;top-level domains&amp;quot;) such as
&lt;code&gt;.org&lt;/code&gt;, &lt;code&gt;.com&lt;/code&gt;, etc. There are actually a number of root
servers, named &lt;code&gt;a.root-servers.net&lt;/code&gt;, &lt;code&gt;b.root-servers.net&lt;/code&gt;,
etc, and the client just picks one.
In order for this to work, the client needs to be preconfigured
with a list of root servers &lt;em&gt;and&lt;/em&gt; of their addresses, so it
can send them messages (obviously it can&#39;t look them up with
DNS because that would require contacting the root servers,
which needs the addresses).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The root server, in this case &lt;code&gt;a.root-servers.net&lt;/code&gt; replies
that &lt;code&gt;b2.org.afilias-nst.org&lt;/code&gt; (operated by name operator
&lt;a href=&quot;https://afilias.info/&quot;&gt;Afilias&lt;/a&gt; is responsible for
&lt;code&gt;.org&lt;/code&gt; and tells the client that&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;.
One interesting thing to note is that the root server
&lt;em&gt;also&lt;/em&gt; provides the address at which
&lt;code&gt;b2.org.afilias-nst.org&lt;/code&gt; can be reached; because
that server also has a name in &lt;code&gt;.org&lt;/code&gt;, the client
can&#39;t use the DNS to resolve it (it would
first need to contact that same server!) and so the
root has to provide the address. The technical
term for this information is &amp;quot;glue&amp;quot;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The client now contacts &lt;code&gt;b2.org.afilias-nst.org&lt;/code&gt; and asks
who is responsible for &lt;code&gt;example.org&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;b2.org.afilias-nst.org&lt;/code&gt; responds that &lt;code&gt;b.iana-servers.net&lt;/code&gt;
is responsible. In this case, the server doesn&#39;t need to provide
a glue address because the response is in &lt;code&gt;.net&lt;/code&gt; and so
the client could look it up via the normal process (not shown).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The client now contacts &lt;code&gt;b.iana-servers.net&lt;/code&gt;, but instead
of asking who is responsible for &lt;code&gt;example.org&lt;/code&gt; it asks
for its address (it already knows &lt;code&gt;b.iana-servers.net&lt;/code&gt; is
responsible).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;b.iana-servers.net&lt;/code&gt; responds that &lt;a href=&quot;http://example.org/&quot;&gt;example.org&lt;/a&gt;&#39;s address
is &lt;code&gt;93.184.216.34&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At this point (after three round trips), the client knows the IP
address for &lt;a href=&quot;http://example.org/&quot;&gt;example.org&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;recursive-resolvers&quot;&gt;Recursive Resolvers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security/#recursive-resolvers&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In the description above, I talked about the &amp;quot;client&amp;quot; resolving
a domain, but as a practical matter, this process is mostly
not done by end-user computers. Instead, those computers
talk to what&#39;s called a &amp;quot;recursive resolver&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
provided by the
network. The way this works is that the user&#39;s computer
sends its query to the recursive resolver, which does the
whole resolution process shown above and then returns the
answer, like so:&lt;/p&gt;
&lt;img style=&quot;width: 60%;&quot; src=&quot;https://educatedguesswork.org/img/dns-recursive.png&quot; alt=&quot;DNS Recursive Resolver&quot; /&gt;
&lt;p&gt;Historically, this approach has been seen as having number of advantages. First, it allows the
recursive resolver to &lt;em&gt;cache&lt;/em&gt;. If your network has 10 clients
(not unusual for even a small home network), then it&#39;s kind of
silly to have each one separately contacting the resolver
for &lt;code&gt;google.com&lt;/code&gt; to learn Google&#39;s address (and even
sillier each time someone wants something in &lt;code&gt;.com&lt;/code&gt;. The recursive
resolver can cache the first response it receives&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; and return responses immediately to other clients,
thus reducing the load on servers and also improving
performance for users because you don&#39;t need as many
round trips to resolve a name.&lt;/p&gt;
&lt;p&gt;Second, it allows the recursive to apply local policies.
For instance, suppose that I don&#39;t want users on my network
to go to &lt;code&gt;attacker.invalid&lt;/code&gt;, I can program my recursive
to return an error instead of resolving it, thus effectively
filtering out those names (this is often called &amp;quot;blackholing&amp;quot;).
It&#39;s pretty common to use this kind of DNS filtering technique
in schools, libraries, etc. to filter out sites deemed
inappropriate.
Of course, whether this is an advantage depends on one&#39;s
perspective: if you&#39;re a user who wants to visit a site
that has been filtered in this way, you might think otherwise
(I&#39;ll get into this more in a future post).&lt;/p&gt;
&lt;p&gt;You can also use control of the resolver to create names
that only resolve locally. Suppose you have something (e.g., a printer)
that you only want to be accessible to users on your local network.
You can (partly) achieve this by not having the name be publicly
resolvable but by having the recursive resolver inserting responses
for it. This is called &lt;em&gt;split horizon&lt;/em&gt; DNS.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
A similar technique is used by some ISPs to serve ads
by detecting if you try to
resolve a name which does not exist (e.g., because of a typo)
and inject their
own response which points you to a page they control.&lt;/p&gt;
&lt;p&gt;Historically, software on the user&#39;s computer didn&#39;t even
talk to the recursive resolver directly. Rather, it called
an &lt;a href=&quot;https://man7.org/linux/man-pages/man3/resolver.3.html&quot;&gt;operating system API&lt;/a&gt;
that did the work for it. This saved work for the client
programmer as well as providing a consistent experience between
different clients on the same machine. This also allowed
the operating system (and the administrator) control of
the resolution process, which is especially important if you
are running other name systems besides DNS, such as
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Windows_Internet_Name_Service&amp;amp;oldid=1057944770&quot;&gt;Windows Internet Name Service&lt;/a&gt;;
the operating system can automatically check all the potential name
services without bothering the client. Now that DNS is so dominant,
this consideration is less important, and as we&#39;ll see later,
DNS in applictions is also becoming more popular.&lt;/p&gt;
&lt;h3 id=&quot;finding-the-recursive-resolver&quot;&gt;Finding the Recursive Resolver &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security/#finding-the-recursive-resolver&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I said above, typically the recursive resolver is associated
with the network, but how does your machine learn about it?
Back in the old days (the 90s!), when you attached your computer to the network
someone would tell you the IP address to use and the IP addresses
of the recursive resolver. You&#39;d put them in a file called
&lt;code&gt;/etc/resolv.conf&lt;/code&gt;, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nameserver 192.168.1.1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, this is not exactly convenient and most people have never
done it (though you still can if you want to!).
Instead, when you join a network, the network sends your device
configuration information, including the IP to use and its recursive
resolvers&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;This means that whoever controls your network controls which
DNS server you use. As a practical matter, there are several
main cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you are connected directly to your ISP network, then it
will be the ISP&#39;s server. This is especially true on
mobile devices.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you are connected to some kind of local network, like
a WiFi router, often that will provide its own resolver,
which isn&#39;t a full recursive but instead connects to the ISP&#39;s resolver (this is called
a &amp;quot;proxy&amp;quot;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If it&#39;s a wireless hotspot like at the airport or a coffee
shop, they will often run their own resolver.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you are in an enterprise network, the enterprise will
often run their own resolver and do some kind of filtering
as mentioned above.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s also possible to use a &amp;quot;public recursive resolver&amp;quot;, which
is one that is not associated with a given network but just offers
DNS service to anyone. There are a number of popular public
resolvers, with the best known being:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Operator&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;IP Address&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Cloudflare&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1.1.1.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Google&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;8.8.8.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Quad9&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;9.9.9.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The reason for the simple addresses is that they are easy to
memorize and therefore to manually configure.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://imgix.bustle.com/mic/daa2c24454d1bb698af67c76f3e93636ffe1c5b331baad97736ffc211e973269.jpg?w=450&amp;amp;h=341&amp;amp;fit=crop&amp;amp;crop=faces&amp;amp;auto=format%2Ccompress&quot; alt=&quot;8.8.8.8 on walls&quot; /&gt;&lt;/p&gt;
&lt;p&gt;There are a number of reasons to use a public resolver, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Predictable good performance (these organizations generally do quite
a good job).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Avoiding filtering. If your network filters DNS, a public resolver
can help avoid that. Famously, back in 2014, when Turkey blocked
Twitter, Turkish protesters were &lt;a href=&quot;https://www.mic.com/articles/85987/turkish-protesters-are-spray-painting-8-8-8-8-and-8-8-4-4-on-walls-here-s-what-it-means&quot;&gt;writing the address of Google Public DNS on walls&lt;/a&gt;
to help others evade the block.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enabling filtering. Several of the public resolvers offer
filtering services, for instance for &lt;a href=&quot;https://blog.cloudflare.com/introducing-1-1-1-1-for-families/&quot;&gt;malware and adult content&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These resolvers are quite popular. As of 2019, about 9% of DNS traffic
went through Google public DNS alone.&lt;/p&gt;
&lt;h2 id=&quot;security-and-privacy&quot;&gt;Security and Privacy &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security/#security-and-privacy&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;DNS security and privacy is, to use a technical term, &amp;quot;bad&amp;quot;. DNS was
designed back in 1987 in an era where there was basically no encryption&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
on the Internet and until recently, not much had changed.&lt;/p&gt;
&lt;p&gt;There are two major attack models to consider:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Attackers who are &amp;quot;off-path&amp;quot;: they can send packets but
can&#39;t see traffic.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Attackers who are &amp;quot;on-path&amp;quot;: between you and the recursive
resolver or between the recursive resolver and the servers.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Historically, DNS security mostly focused on preventing forged
responses by off-path attackers, which it should have been possible to protect
against even without cryptography. In practice, however, due to some misfeatures in the protocol combined
with some implementation errors (&lt;a href=&quot;https://www.cs.cornell.edu/~shmat/shmat_securecomm10.pdf&quot;&gt;Son and
Shmatikov&lt;/a&gt;
do a good job covering this) DNS has not done always done a fantastic
job here, although modern resolvers have a number of defenses against
off-path attacks. Without cryptography, it&#39;s essentially not possible
to protect against on-path attackers, as they can impersonate anyone
to anyone else. There are a number of cryptographic approaches
designed to protect against on-path attacks, which I&#39;ll be
covering in a future post.&lt;/p&gt;
&lt;p&gt;The good news, such as it is, is that the correctness of DNS
responses has an increasingly smaller impact on user security,
especially for the Web. The reason for this is that if traffic
is encrypted with &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=HTTPS&amp;amp;oldid=1061087458&quot;&gt;HTTPS&lt;/a&gt;--which
&lt;a href=&quot;https://letsencrypt.org/stats/#percent-pageloads&quot;&gt;something like 80% of Web page loads&lt;/a&gt;
are, then even if an attacker manages to change DNS to point
you to the wrong server, they will not be able to impersonate
the right server. That doesn&#39;t mean that they won&#39;t be able
to mount a &amp;quot;denial of service&amp;quot; attack in which they stop you
from connecting at all, but that&#39;s nowhere near as bad
as impersonating your bank.&lt;/p&gt;
&lt;p&gt;It&#39;s important to note that there is a big difference between
ensuring that DNS responses are correct and ensuring that they
are private. Much of the work on DNS security (e.g., DNSSEC)
is focused on ensuring correctness of the response but doesn&#39;t
prevent attackers from learning what domains you are resolving,
which has obvious privacy implications. Specifically, not only
does your resolver get to see where you are going (this
can be a problem in and of itself if your ISP has &lt;a href=&quot;https://www.ftc.gov/system/files/documents/reports/look-what-isps-know-about-you-examining-privacy-practices-six-major-internet-service-providers/p195402_isp_6b_staff_report.pdf&quot;&gt;bad privacy practices&lt;/a&gt;) but anyone on the same network does as well. Again, this is
something where cryptography can help; more on this later too.&lt;/p&gt;
&lt;h2 id=&quot;next-up%3A-dnssec&quot;&gt;Next Up: DNSSEC &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/dns-security/#next-up%3A-dnssec&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;OK, so this was all pretty depressing, but surely now that we
have better cryptography, we can do something about it, right?
The next post covers the first major standardized attempt to
protect DNS, Domain Name System Security Extensions (DNSSEC).&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
And typically, it&#39;s UDP, so one packet out and one packet back. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Historically, the client would actually ask for the
answer to &lt;code&gt;example.org&lt;/code&gt; because it&#39;s possible that
the server you are asking would have it and could
answer right away but this
has the property that you leak your entire query to
everyone, and so it&#39;s common now to just resolve
one label at a time, a practice called QNAME Minimization
(QMIN) and specified in &lt;a href=&quot;https://tools.ietf.org/rfcmarkup?doc=7816&quot;&gt;RFC 7816&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are actually 6 servers responsible for &lt;code&gt;.org&lt;/code&gt;:
&lt;code&gt;b2.org.afilias-nst.org&lt;/code&gt;,
&lt;code&gt;b0.org.afilias-nst.org&lt;/code&gt;,
&lt;code&gt;a2.org.afilias-nst.info&lt;/code&gt;,
&lt;code&gt;d0.org.afilias-nst.org&lt;/code&gt;,
&lt;code&gt;c0.org.afilias-nst.info&lt;/code&gt;,
and
&lt;code&gt;a0.org.afilias-nst.info&lt;/code&gt; but I&#39;m simplifying. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Confusingly, the thing on the user&#39;s computer is
called a &amp;quot;stub resolver&amp;quot; and the servers are
&lt;em&gt;also&lt;/em&gt; called &amp;quot;resolvers&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The records
have indicators in them indicating their cache validity
lifetime &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The security provided by this mechanism is limited unless you
also make the device unreachable from the Internet, e.g.,
via a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Firewall_(computing)&amp;amp;oldid=1060666120&quot;&gt;firewall&lt;/a&gt;.
Otherwise, if the attacker can guess the IP address of the device
(probably not hard with IPv4) they can attack it. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;With IPv4 this is likely done with &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Dynamic_Host_Configuration_Protocol&amp;amp;oldid=1058748096&quot;&gt;DHCP&lt;/a&gt;, with IPv6 either with
a &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6106&quot;&gt;Router Advertisement&lt;/a&gt;
or &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8415&quot;&gt;DHCPv6&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Often these will also have some kind of &amp;quot;captive portal&amp;quot;
functionality which forces you to log onto the network
first. These can be implemented with DNS by pointing
any domain to the captive portal server. &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I often hear this framed as if the people who designed these
systems didn&#39;t know about security, but that&#39;s not really
true. It&#39;s mostly that due to a combination of missing
technological pieces, patents, and resource constraints
the kind of widespread encryption we&#39;re starting to take
for granted was quite difficult to deploy. Recall
that the &lt;a href=&quot;https://patents.google.com/patent/US4405829&quot;&gt;patent&lt;/a&gt;
on RSA didn&#39;t expire until 2000). &lt;a href=&quot;https://educatedguesswork.org/posts/dns-security/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>A look at the Dutch vaccine passport system</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport-nl/"/>
		<updated>2021-12-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport-nl/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;script src=&quot;https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js&quot;&gt;&lt;/script&gt;
&lt;script&gt;
            mermaid.initialize({ startOnLoad: true,
                sequence: {
                    mirrorActors: false
                }});
&lt;/script&gt;
&lt;p&gt;Most of the widely deployed vaccine passport systems
(&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/&quot;&gt;New York&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca/&quot;&gt;California&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/&quot;&gt;EU&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/new-zealand/&quot;&gt;New Zealand&lt;/a&gt;)
are signed attestations to a person&#39;s name and vaccination/COVID test
status. These have non-ideal privacy properties because it&#39;s possible
for the relying party (the person checking the passport) to use the
credential to track the user. As I discussed &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon&quot;&gt;earlier&lt;/a&gt;,
it seems to be quite difficult to significantly improve privacy here, so
I was very interested to learn about the Dutch CoronaCheck
&lt;a href=&quot;https://www.government.nl/topics/coronavirus-covid-19/covid-certificate/proof-of-vaccination&quot;&gt;CoronaCheck system&lt;/a&gt;,
which has privacy as an explicit part of the design.&lt;/p&gt;
&lt;p&gt;Note: I&#39;ve not been able to find a complete specification of the
system. This description is based on the documents found
&lt;a href=&quot;https://github.com/minvws/nl-covid19-coronacheck-app-coordination&quot;&gt;here&lt;/a&gt;,
which provide a broad overview but not enough to implement the system,
and some examination of the &lt;a href=&quot;https://github.com/minvws/nl-covid19-coronacheck-hcert&quot;&gt;issuer code&lt;/a&gt;.
It&#39;s especially hard to tell what is actually deployed. With that
said, here is what &lt;a href=&quot;https://github.com/minvws/nl-covid19-coronacheck-app-coordination/blob/main/architecture/Privacy%20Preserving%20Green%20Card.md&quot;&gt;seems to be going on:&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;basic-design&quot;&gt;Basic Design &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#basic-design&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As with all the other systems, the basic unit of the system is a signed credential.
However, this credential has two main differences from what I&#39;ve seen before:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It contains far less identity information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It is signed with a special cryptographic algorithm that provides
unlinkability.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&#39;s look at each of these pieces in turn.&lt;/p&gt;
&lt;h3 id=&quot;identity-minimization&quot;&gt;Identity Minimization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#identity-minimization&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first piece is essentially straightforward. A typical vaccine
passport contains full identifying information for the subject,
such as the full name and their birthday, though I believe
that the Israeli ones contain a national ID number. This information
can then be compared with some biometric identification
(e.g., a driver&#39;s license) to physically authenticate the person.
The Dutch version just contains the person&#39;s initials and their
birth month and day. This superficially seems like a privacy
improvement, but I&#39;m not sure how much it really is.&lt;/p&gt;
&lt;p&gt;The basic problem is that the system is only k-anonymous. It&#39;s a bit
difficult to precisely determine the number of bits of information
here, but we can approximate it as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There are 12 birth months: 3.5 bits ($log_2(12)$)&lt;/li&gt;
&lt;li&gt;There are ~30 birth days: 5 bits ($log_2(30)$)&lt;/li&gt;
&lt;li&gt;There are 26 letters for each initial, but they&#39;re not evenly
distributed, so let&#39;s say 4 bits each: 8 bits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This gives the relying party 16.5 bits of entropy, dividing
the population into about 100,000 groups. The population
of the Netherlands is about 18 million, so this gives us an anonymity
set of around 200. Moreover, when combined with side information
like apparent age and gender, the anonymity set becomes a lot smaller.
Also, as I noted earlier,
it&#39;s made worse by the fact that people&#39;s behavior isn&#39;t random.
For instance, if we have four authentications for the initials ER
within an hour with two at outdoor stores in Mountain View and
two in bars in Los Angeles, it&#39;s likely that the first two are one
person and the second two are another. This kind of constraint
solving problem is something computers are very good at; you might
not get a complete record of someone&#39;s behavior, but you&#39;ll learn
a lot.&lt;/p&gt;
&lt;h3 id=&quot;digital-signatures&quot;&gt;Digital Signatures &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#digital-signatures&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Of course, minimizing the data in the passport doesn&#39;t prevent
tracking if you show the same passport every time. The
problem here isn&#39;t the data in the passport, which we&#39;ll
assume is k-anonymous as described in the previous section,
but the signature, which is high entropy&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and therefore unique.
In my &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon&quot;&gt;earlier post&lt;/a&gt; I described a
brute force way to address this in which the user gets a big
pile of tokens each with separate signatures. The Dutch system
instead uses a special digital signature scheme
(&lt;a href=&quot;https://link.springer.com/chapter/10.1007/3-540-36413-7_20&quot;&gt;Camenisch-Lysyanskaya Signatures&lt;/a&gt;).
The details are beyond the scope of this post, but the basic idea is
that the credential issuer performs a single signature which the
subject can then use to prove the validity of their credential to a
relying party without revealing the signature itself. Each proof is
based on unique random data and so can&#39;t be linked to a subsequent
proof.&lt;/p&gt;
&lt;p&gt;I know that language was a bit technical, but it&#39;s enough
for our purposes to think of this as a system in which the signer
makes one signature and the subject gets to make as many equivalent
but distinct and unlinkable signatures as it wants. This is equivalent
but a lot more efficient to the &amp;quot;pile of tokens&amp;quot; approach (though
the Dutch system &lt;em&gt;also&lt;/em&gt; uses a &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#concealing-health-status&quot;&gt;pile of tokens&lt;/a&gt; for
a different reason).&lt;/p&gt;
&lt;h3 id=&quot;remember%2C-you-have-to-show-id&quot;&gt;Remember, you have to show ID &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#remember%2C-you-have-to-show-id&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;These are all understandable design choices, but it&#39;s not clear to me
how they help. The problem, as I noted previously, is that the vaccine
passport isn&#39;t a standalone form of proof but rather is embedded in a
system in which you have to show identification to bind the credential
to you. Even though the &lt;em&gt;credential&lt;/em&gt; only has your initials and
partial birthday, the other form of identification contains your full
name, your picture, and (probably) your birthday, which means that the
relying party has those.&lt;/p&gt;
&lt;p&gt;It&#39;s true that they relying party that has to scan the vaccine passport
doesn&#39;t have to scan that form of identification--though they might
anyway--but even if they don&#39;t, your privacy now depends essentially
on them not being able to remember and record &lt;em&gt;any&lt;/em&gt; information
from it. For instance, if they just record your birth year,
your gender, and your first name, that&#39;s probably enough to uniquely identify
most people.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
It&#39;s not clear what prevents this form of attack, though
it&#39;s probably more challenging in high throughput areas where
the verifier would have less of an opportunity to record the
data.&lt;/p&gt;
&lt;p&gt;Note that we&#39;re also assuming an incredibly weak threat model here
when we restrict ourselves to people&#39;s memories. Just because
the verifier isn&#39;t obviously scanning your ID doesn&#39;t mean they
aren&#39;t surreptitiously doing so. It&#39;s not at all difficult to
conceal a small camera in whatever location the verification
happens and show the ID to that camera for recording. Of course,
at this point the vaccine passport isn&#39;t needed for
tracking at all, because the identification isn&#39;t enough, but then
why go to all the trouble to make the vaccine passport quasi-anonymous?&lt;/p&gt;
&lt;h2 id=&quot;concealing-health-status&quot;&gt;Concealing Health Status &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#concealing-health-status&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even if we give up on preventing tracking, typical credentials
still leak a fair amount of information. For instance the
California credential &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca&quot;&gt;contains&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The vaccine type (I think)&lt;/li&gt;
&lt;li&gt;The lot number&lt;/li&gt;
&lt;li&gt;Where it was performed&lt;/li&gt;
&lt;li&gt;The date of injection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This can of course be used for tracking (see above) but it also might
be something that the subject doesn&#39;t want people to know. For
instance, the designers of the Dutch system argue that the credential
shouldn&#39;t distinguish between various forms of &amp;quot;safety&amp;quot; (e.g., a
negative test, recovery from COVID, or vaccination).&lt;/p&gt;
&lt;p&gt;You could just remove all this information--as the NZ system does--and have the semantics of
the credential be &amp;quot;this person is OK&amp;quot;, but this presents the problem that different kinds of credentials should
be acceptable for different periods. For instance, in the Dutch
system they want a  negative test to be usable for 40 hours, vaccination for
365 days, and recovery for 180 days). But if you just have a credential
with a fixed validity period from the initiating event, this leaks
both the type of the event and the time it happened (see my
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz&quot;&gt;writeup&lt;/a&gt; on the New Zealand system for
some of the problems with that). The Dutch system deals with this
by providing the subject with multiple credential &amp;quot;strips&amp;quot;,
each of which is only good for 24 hours. Strips are issued
for 28 days at a time--obviously fewer in the case of a test--and
the subject just presents the currently valid strip (with a
randomized signature, as described above) when they need to
authenticate.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This design is sort of a compromise in that it doesn&#39;t require
the subject to be online all the time, but they do need to be
online somewhat regularly in order to get a new set of strips.
It&#39;s also not really &lt;a href=&quot;https://github.com/minvws/nl-covid19-coronacheck-app-coordination/blob/main/architecture/Privacy%20Preserving%20Green%20Card.md#paper-proofs&quot;&gt;compatible&lt;/a&gt; with people who print out their
credentials. In that case, the strips are just valid for 4 weeks
for vaccination/recovery and 40 hours for negative tests (which
leaks whether this is a test or not), which reduces the load some. Even so, printing out
a new strip every 28 days sounds like kind of a pain.&lt;/p&gt;
&lt;p&gt;One obvious problem--as with the NZ design--is flexibility.
What happens if you issue a bunch of 28-day strips on day 1
and then on day 5 you discover that it&#39;s necessary to treat
different vaccination status differently? This isn&#39;t a hypothetical
scenario, given that it seems that the various vaccines
may provide different levels of protection against Omicron
and even with a vaccine family there is probably a lot of
difference between people who received two doses
and those who have been boosted, as &lt;a href=&quot;https://www.pfizer.com/news/press-release/press-release-detail/pfizer-and-biontech-provide-update-omicron-variant&quot;&gt;seen with Pfizer&lt;/a&gt;. In this case, you might
want to start treating boosted people differently, but
that&#39;s a problem if the credentials are good for 28 days.
The Dutch system does have a way of dealing with this,
which is effectively to invalidate &lt;em&gt;all&lt;/em&gt; credentials
(by incrementing the minimum version number field),
but obviously this is going to cause a lot of disruption,
especially for those who have printed out credentials
which will suddenly become invalid.&lt;/p&gt;
&lt;p&gt;Another problem is that there are probably settings even
now in which you would want to distinguish between
different credential types. For instance, in case of
a close contact, the Palo Alto schools &lt;a href=&quot;https://www.pausd.org/return-to-campus/quarantine-info&quot;&gt;require&lt;/a&gt;
that students show two negative COVID tests (at day 1 and 5),
but if you just have a credential that indicates
that the subject had either a vaccination &lt;em&gt;or&lt;/em&gt; a negative test without
telling you which kind. there is no way to
use it to fulfill this requirement. This seems like a pretty
common scenario and one that&#39;s difficult to fulfill with any
kind of system in which the &amp;quot;what is acceptable&amp;quot; logic is
central--and uniform--and verifiers just get a yes/no answer.&lt;/p&gt;
&lt;p&gt;Note that it &lt;em&gt;is&lt;/em&gt; possible to do better here: you can build
credential systems in which the subject proves not only
that they have a valid credential but can prove specific
properties attested to in that credential without revealing
the whole thing. For instance, you might imagine a system
in which the credential contained all the information found
in a typical system but where you only disclosed the minimum
amount of information required for a given scenario
(e.g., that you had a booster over two weeks ago). That
would allow you to have the logic for the system in the
verifier but still limit disclosure of information. It&#39;s
true that these systems typically involve some fancy crypto
(zero-knowledge proofs) but it&#39;s reasonably well understood
and this system already is using a lot of crypto and indeed
Camenisch-Lysyanskaya signatures are often used in precisely
this kind of application so it&#39;s not clear to me why this
design doesn&#39;t do that.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I am glad to see an attempt to do something new here rather than
just another trivial variant of the &amp;quot;signed credential&amp;quot; design,
and it does suggest that there might be some room to improve
the privacy of vaccine credentials. With that said,
I&#39;m kind of skeptical of the particular design
choices. In particular, I don&#39;t think it&#39;s that useful
to try to conceal the subject&#39;s identity given that the subject
has to identify themselves in order to use the passport. It&#39;s
possible that it&#39;s useful to conceal the details of what the
credential is attesting to (vaccination, test, etc.) but the strip mechanism seems kind
of clunky and inflexible, so I&#39;m not sure that&#39;s the right design either.
I know I&#39;m repeating myself, but it would be a lot better
if instead of everyone inventing their own thing
we had some kind of multistakeholder effort which would
get to clear requirements and then try to converge on
a single design which did a good job of meeting those.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This might not be immediately obvious, but
if it weren&#39;t high entropy it would be trivial to forge
by just generating candidate signatures and seeing if they verify. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;See Latanya Sweeney&#39;s &lt;a href=&quot;http://ggs685.pbworks.com/w/file/fetch/94376315/Latanya.pdf&quot;&gt;Simple Demographics Often Identify People Uniquely&lt;/a&gt; for more on this. For instance, she reports that
&amp;quot;It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}.&amp;quot; &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is also some fancy &lt;a href=&quot;https://github.com/minvws/nl-covid19-coronacheck-app-coordination/blob/main/architecture/Privacy%20Preserving%20Green%20Card.md#strip-randomization&quot;&gt;randomization&lt;/a&gt; to
prevent a test credential from revealing the time of
the test, though this kind of seems like overengineering
to me. The 40 hour number seems pretty arbitrary, so
you could just have the last strip expire at the
first midnight that was at least 40 hours after the test. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nl/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Privacy Preserving Vaccine Credentials</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport-anon/"/>
		<updated>2021-12-07T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport-anon/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;script src=&quot;https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js&quot;&gt;&lt;/script&gt;
&lt;script&gt;
            mermaid.initialize({ startOnLoad: true,
                sequence: {
                    mirrorActors: false
                }});
&lt;/script&gt;
&lt;p&gt;As I noted &lt;a href=&quot;https://educatedguesswork.org/tags/vaccine%20passports/&quot;&gt;previously&lt;/a&gt;, we&#39;re
seeing each jurisdiction design their own vaccine passport system
(&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/&quot;&gt;New York&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca/&quot;&gt;California&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/&quot;&gt;EU&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/new-zealand/&quot;&gt;New Zealand&lt;/a&gt;).
While these systems differ in detail, they&#39;re conceptually
pretty similar: a digital signature over a record consisting
of the user&#39;s identity and some information about the
subject&#39;s vaccine status.&lt;/p&gt;
&lt;p&gt;This has the obvious privacy problem that the verifier can record the credential (or the information in it)
and use it for tracking where someone has proved their vaccination status (and hence visited).
It&#39;s not really possible to do better with a single static credential
printed on a piece of paper. Obviously, the paper isn&#39;t going
to change and so whatever the contents are they can be used
for tracking. Moreover, the credential has to be verified by
some kind of software—unless you can do elliptic curve math
in your head—and that software can just record the
information or transmit it back to some central location.
Typically the official apps are supposed to just discard
the credential after verifying it, but obviously
you&#39;re just trusting them to do that.&lt;/p&gt;
&lt;p&gt;If we relax the assumption that the credential
is a single piece of paper then the design space seems like it opens up
a bit, but—as we see below—probably not enough
to really provide privacy.&lt;/p&gt;
&lt;h3 id=&quot;digression%3A-anonymous-credentials&quot;&gt;Digression: Anonymous Credentials &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#digression%3A-anonymous-credentials&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Before looking at the vaccine passport problem, it&#39;s helpful to look
at a somewhat simpler problem: privacy preserving authentication.&lt;/p&gt;
&lt;p&gt;Suppose that we want to build a system which gives people access
to some resource but that doesn&#39;t identify them. As an example,
I might want to let people pay road tolls but not be able to
track them when they do so. Conventional systems just give
each user an account number that they use to authenticate to the
toll plaza, but then whoever operates the toll plazas
can look at what credential was used and thus build a profile
of each user.&lt;/p&gt;
&lt;p&gt;There&#39;s a straightforward solution to this problem, which is to
give each user a large pile of single-use credentials, each of
which is good for one transaction. That prevents the toll
plaza from connecting visits &lt;em&gt;unless&lt;/em&gt; it colludes with whoever
issued the token. However, in the real world, the same state
agency probably issues the tokens as runs the toll plaza, so
they&#39;re automatically colluding. Fortunately, there is a cryptographic
solution, called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Blind_signature&amp;amp;oldid=1048874330&quot;&gt;blind signatures&lt;/a&gt;.
A blind signature is a construction which allows someone to digitally
sign a value without seeing it, like so:&lt;/p&gt;
&lt;div class=&quot;mermaid&quot;&gt;
sequenceDiagram
  note over Alice: Generate random r
  Alice -&gt;&gt; Issuer: Blind(r)
  Issuer -&gt;&gt; Alice: Sign(Blind(r))
&lt;/div&gt;
&lt;p&gt;Alice can then compute $Unblind(Sign(Blind(r))) &#92;rightarrow Sign(r)$
to recover a valid signature over $r$, even though the issuer never
saw $r$.&lt;/p&gt;
&lt;p&gt;It&#39;s pretty easy to see how to turn this into an anonymous credential
system: Alice generates a pile of random tokens, gets the issuer
to sign them, and then redeems them one at a time. The toll plaza
just verifies that each one is &lt;em&gt;fresh&lt;/em&gt; (i.e., it hasn&#39;t been
used before) and if so, accepts it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This is what&#39;s called a &amp;quot;bearer token&amp;quot; which means that it&#39;s secret
and just the possession of the token is sufficient to prove your
identity, but you can also have a public key in the token so you
can authenticate with a digital signature. There&#39;s a lot of much fancier stuff you can do here,
including rerandomizable credentials
that don&#39;t require you to get a pile of tokens
and credentials which let you prove specific
properties (e.g., that you&#39;re over 21) but
we don&#39;t need to worry about that for now.&lt;/p&gt;
&lt;h2 id=&quot;anonymous-credentials-for-vaccine-passports&quot;&gt;Anonymous Credentials for Vaccine Passports &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#anonymous-credentials-for-vaccine-passports&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Naively, it seems pretty obvious how to use this kind of anonymous
credential for vaccine passports:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Replace the signed vaccine passport with an anonymous credential
that just says &amp;quot;the holder of this credential is vaccinated&amp;quot;,
potentially with an expiration date.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;Everyone&#39;s app is able to get a pile of these credentials.&lt;/li&gt;
&lt;li&gt;When you need to prove your vaccination status, you show the next
credential.&lt;/li&gt;
&lt;li&gt;When your app runs out, it just gets some more.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Unfortunately, this has a number of problems, the most important
of which is that the credential isn&#39;t &lt;em&gt;bound&lt;/em&gt; to the user, which
opens up a number of attacks. Perhaps the simplest is that a
relying party can &lt;em&gt;replay&lt;/em&gt; a credential that is provided to
it to another relying party. For instance, suppose that I am
the host at a restaurant charged with checking people&#39;s vaccine
status: I can collect all the credentials people show me and
then use them to &lt;em&gt;prove&lt;/em&gt; that I—or others—are vaccinated.&lt;/p&gt;
&lt;p&gt;This simple version of the attack can be addressed by replacing
the bearer token with one which requires the person to authenticated.
e.g., via a digital signature of a verifier-provided challenge. However, this leaves open what&#39;s
called a &amp;quot;relay attack&amp;quot; in which the cheating verifier simultaneously
authenticates themselves to another verifier, like so:&lt;/p&gt;
&lt;div class=&quot;mermaid&quot;&gt;
sequenceDiagram
  Alice -&gt;&gt; Verifier 1: Hello
  Verifier 1 -&gt;&gt; Verifier 2: Hello
  Verifier 2 -&gt;&gt; Verifier 1: Challenge
  Verifier 1 -&gt;&gt; Alice: Challenge
  Alice -&gt;&gt; Verifier 1: Sign(Challenge)
  Verifier 1 -&gt;&gt; Verifier 2: Sign(Challenge)
  note over Verifier 2: Accepted
&lt;/div&gt;
&lt;p&gt;This isn&#39;t that great an attack because the cheating verifier
has to be online and authenticating to another verifier at the same time as the
vaccinated person (though not in the same place because the
challenge and response can just be transmitted from place to place).
However, there is a related attack that is worse in which
a malicious vaccinated person with a valid credential helps
someone else pretend to be vaccinated. This is pretty much the same
message flow with different labels:&lt;/p&gt;
&lt;div class=&quot;mermaid&quot;&gt;
sequenceDiagram
  participant Vaccinated
  Unvaccinated -&gt;&gt; Verifier: Hello
  Verifier -&gt;&gt; Unvaccinated: Challenge
  Unvaccinated -&gt;&gt; Vaccinated: Challenge
  Vaccinated -&gt;&gt; Unvaccinated: Sign(Challenge)
  Unvaccinated -&gt;&gt; Verifier: Sign(Challenge)
  note over Verifier: Accepted  
&lt;/div&gt;
&lt;p&gt;The practical version of this attack is that someone (or someones)
get vaccinated and then get a set of valid credentials. They
stand up a server on the Internet which accepts challenges
and responds with signed responses, thus enabling arbitrary
people to pretend to be vaccinated. And because the system
is anonymous, tracking down the operator of the server
and revoking their credentials is not easy.&lt;/p&gt;
&lt;h2 id=&quot;less-anonymous-credentials&quot;&gt;Less Anonymous Credentials &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#less-anonymous-credentials&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This kind of relay attack is well known in the literature; it&#39;s really
just the interactive version of giving someone one of your anonymous
bearer credentials. The underlying problem is that the verifier&#39;s
isn&#39;t actually able to identify the person claiming to be vaccinated:
all they have is a message that says &amp;quot;the person transmitting this to
you is vaccinated&amp;quot; but that could be the person holding the phone or
someone across the world.&lt;/p&gt;
&lt;p&gt;The fix, of course, is to have the credential contain some information
that lets you identify the person it&#39;s describing. There are a number
of alternatives here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A biometric such as a picture&lt;/li&gt;
&lt;li&gt;The person&#39;s name, which can then be used in concert with their photo ID
to confirm their identity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The obvious problem here is that this information has to be
consistent enough to identify the person and therefore it can
be used for tracking. In particular, if the credential contains
the person&#39;s name and birthday, then you can just record that
and use it for tracking.&lt;/p&gt;
&lt;p&gt;There are some small things one could imagine doing to improve
the situation. For instance, instead of having one photo of the
person, you could use a different picture every time so that it
wasn&#39;t bitwise identical. This can be done trivially by compressing
with slightly different parameters or you could do something more
complicated like automatically generating lookalike images with
some sort of AI system. The problem, of course, is you can run
the process in reverse to generate a hash of the image that is
resistant to these kinds of manipulation (remember
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Perceptual_hashing&amp;amp;oldid=1058664044&quot;&gt;perceptual hashing&lt;/a&gt;
from my &lt;a href=&quot;https://educatedguesswork.org/tags/apple%20csam%20scanning/&quot;&gt;posts&lt;/a&gt;
on Apple&#39;s child sexual abuse material scanning system.) Moreover,
this kind of hashing is a lot easier because you don&#39;t need to conceal the
original image so you can ship quite a rich hash that is very
accurate.&lt;/p&gt;
&lt;p&gt;One approach I&#39;ve seen proposed for dealing with names and
birthdays is to just encode some subset of the letters,
e.g., &amp;quot;E... Re.....a&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; and maybe just the month and day
of birth (the Dutch &lt;a href=&quot;https://github.com/minvws/nl-covid19-coronacheck-app-coordination/blob/main/architecture/Privacy%20Preserving%20Green%20Card.md&quot;&gt;CoronaCheck&lt;/a&gt; system
encodes initials and birth day/month; I hope to write something
about that soon). This doesn&#39;t provide great privacy for two reasons.
The first is that it&#39;s only k-anonymous&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;and k can&#39;t be that big; this has to be the case because
it has to be sufficiently identifying to prevent me from using
your ID to prove my vaccination status. This is already a problem
but it&#39;s made worse by the fact that people&#39;s behavior isn&#39;t random.
For instance, if we have four authentications for the initials ER
within an hour with two at outdoor stores in Mountain View and
two in bars in Los Angeles, it&#39;s likely that the first two are one
person and the second two are another. This kind of constraint
solving problem is something computers are very good at; you might
not get a complete record of someone&#39;s behavior, but you&#39;ll learn
a lot.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
The more serious issue is that the initials/birthday need
to be used with a photo ID, which of course has the person&#39;s
full name. This allows the verifier to record that—even assuming
that they don&#39;t just scan it, which is common in many places—which really
reduces the privacy value of having the vaccine credential
contain limited information.&lt;/p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I don&#39;t mean to suggest here that anonymous credentials can&#39;t work
at all. There are plenty of settings where what you&#39;re authenticating
is just the messages you&#39;re sending. For instance &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-ietf-privacypass-architecture/&quot;&gt;Privacy Pass&lt;/a&gt;
and &lt;a href=&quot;https://www.ietf.org/archive/id/draft-private-access-tokens-00.html&quot;&gt;Private Access Tokens&lt;/a&gt;
are systems designed to prove that someone is an authorized user (for
some meaning of authorized) without revealing anything else about
them. These systems can work because the only thing you are
trying to authenticate is the person&#39;s messages, not the person
themself.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
The reason that these credentials don&#39;t work well in the vaccine
setting is that you are trying to prove something different,
namely that they apply to a particular human. This requires
identifying that person, which makes the whole thing non-anonymous.
This is a general limitation of anonymity systems: they do well
in settings where the actual interaction you are trying to
perform is easily anonymizable (e.g., over the Internet)
and poorly when it is not (e.g., doing something in the physical world).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
This is of course bad news for privacy because it&#39;s only getting
easier to do surveillance in the physical world.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that deployed systems usually have license plate cameras
which can be used in cases where someone doesn&#39;t pay the
toll, but of course can also be used for &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates&quot;&gt;surveillance&lt;/a&gt;
of every car which goes through, which kind of defeats the whole purpose of this. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Getting the expiration
date encoded is a little tricky because in the simple
system I showed above, the issuer knows nothing about what
it&#39;s signing. There are a few alternatives, with perhaps
the simplest one being to use a separate signing key
for each expiration date. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Kind of like what United does with their
upgrade list. I am &amp;quot;RESE&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Which is to say that
the credential applies to a k-sized set of people and thus each
person is hiding in a set of that size. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You might be able to improve the situation some by revealing
a different set of letters in the name each time. This would
require some more analysis. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;And even with these systems you have to defend against
attacks where the person gives others copies of their token. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;See also
my previous &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates&quot;&gt;post&lt;/a&gt; on license plates. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-anon/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Highline Trail Adventure Run Report</title>
		<link href="https://educatedguesswork.org/posts/highline/"/>
		<updated>2021-11-30T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/highline/</id>
		<content type="html">&lt;p&gt;TL;DR. Great views but slow going. Had to bail out at mile 38.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://educatedguesswork.org/img/highline-panorama.jpeg&quot;&gt;&lt;img src=&quot;https://educatedguesswork.org/img/highline-panorama.jpeg&quot; alt=&quot;Highline Panorama&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On Monday, November 22, For the last run of the season, my training partner &lt;a href=&quot;https://chris-wood.github.io/&quot;&gt;Chris Wood&lt;/a&gt;
and I decided to do the Arizona &lt;a href=&quot;https://www.trailrunproject.com/trail/7014445/highline-trail-31-nrt&quot;&gt;Highline Trail #31&lt;/a&gt;. We were already planning to
do &lt;a href=&quot;https://zanegrey50.com/&quot;&gt;Zane Grey 100K&lt;/a&gt; which covers
this trail and then some more, so this seemed like a good
opportunity to check it out in non-race conditions.&lt;/p&gt;
&lt;p&gt;In retrospect, this turns out not to have been as good an idea as it
looked in advance. The basic statistics of the Highline Trail
(50.6 miles, +7804/-6490) are actually quite manageable in a day
(for reference I did &lt;a href=&quot;https://www.khraces.com/series/sean-o-brien-50-50&quot;&gt;Sean O&#39;Brien 100K&lt;/a&gt; (62mi, +13130/-13130) in 12:53 (&lt;a href=&quot;https://educatedguesswork.org/posts/sob100k&quot;&gt;race report&lt;/a&gt;)). What really makes the difference here
is that the trail itself is much more difficult, mostly very
rocky and technical. I knew some of
of this in advance because I&#39;d done Zane Grey 50 mile back in
2019 (when it was an out-and-back from Rim Top Trailhead) and
fell several times. What I didn&#39;t know was how difficult it
would be to find the trail--and how easy it would be to get lost--without having the course pre-cleared and marked for me.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/highline-course.png&quot; alt=&quot;Highline Overview&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;logistics&quot;&gt;Logistics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#logistics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The Highline Trail runs from the Pine Trailhead (unsurprisingly,
in Pine) in the West to 260 Trailhead in the East. There&#39;s no
official way to shuttle between these two locations and I wasn&#39;t
even sure we would have reliable mobile service to Lyft/Uber
between them, so we opted to just rent two cars. We stayed at
the &lt;a href=&quot;https://www.diamondresortsandhotels.com/Resorts/Kohls-Ranch-Lodge&quot;&gt;Kohl&#39;s Ranch Lodge&lt;/a&gt;,
which is about 10 miles away from 260 Trailhead, dropped
one car off at 260 TH the night before and then drove the other
one to Pine TH the morning of our run. The plan was to run to
the 260 TH car, then drive back to Pine to pick up the other one.&lt;/p&gt;
&lt;p&gt;Sunrise in this area is around 7:00 AM this time of year with sunset
around 5:20, so there&#39;s no realistic way to avoid running in the
dark. We planned to start around 5:30, figuring it would actually
start to get light around 6:15-6:30 and then would have about 12 hours
more of daylight.  We prepped all our stuff the night before and got
up at 3:40ish, figuring we&#39;d leave about 4:20 and get to the trailhead
a little before 5, use the bathroom, etc. and be on the trail before
5:30. This sort of worked: we were out a bit after 4 but then I
realized I&#39;d left all my bottles back in the refrigerator so had
to head back to the hotel.&lt;/p&gt;
&lt;p&gt;At the end of the day, we got to the trailhead around 5, but
it was a lot colder than we had expected (~32 F), so we stalled
for a while before we actually started and ended up spending
about 10 minutes in the car with the heater on before we
were willing to actually start. Then we immediately took the wrong
trail and had to backtrack, so ended up starting at 5:58.&lt;/p&gt;
&lt;h2 id=&quot;the-run&quot;&gt;The Run &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#the-run&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The trail is overall uphill, but there are really only three significant
climbs: The first 2ish miles are a long climb of about 1000 ft (to 6300 ft)
followed a long descent down to mile 8, a 5 mile climb (to 6400 ft) and then
rolling uphill out to 30 miles and then one more 500ft descent/climb pair.
There&#39;s also a long climb at the end, but as you&#39;ll see, we didn&#39;t make that far.&lt;/p&gt;
&lt;h3 id=&quot;start-to-mile-22.5%3A-mostly-smooth-%5B%2B2786%2F-2782%5D&quot;&gt;Start to Mile 22.5: Mostly smooth [+2786/-2782] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#start-to-mile-22.5%3A-mostly-smooth-%5B%2B2786%2F-2782%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We started out on headlamp and everything was fine for the first mile
or so, at which point we realized we had gone offtrail. I had downloaded
the TrailRunProject GPX file of the trail and we got relatively early notification
that we were off. After some backtracking, we found the fork we had
missed and proceeded upward only to make the same mistake about 1/3 of a mile
later. In both cases we had to fight through some fairly thick brush
to stay on trail. This whole first couple miles was kind of overgrown,
so it was a bit hard to figure out.&lt;/p&gt;
&lt;p&gt;In retrospect looking at the map, it appears that what happened
is that the trail was rerouted a while back and the GPX we had
was pre-reroute. You can see this on the map in Runalyze below:&lt;/p&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/highline-off-course1.png&quot;&gt;
&lt;img alt=&quot;Map of off course 1&quot; src=&quot;https://educatedguesswork.org/img/highline-off-course1.png&quot; width=&quot;50%&quot; /&gt;
&lt;/a&gt;
&lt;p&gt;What seems to be going on here is that the GPX track takes the
original straight through route but the newer route switchbacks
more. It&#39;s in better shape which is why we kept taking it,
and we should have just stayed on it, but instead we took
the (mostly) unmaintained original route. This is a mistake
we would make a lot later. No doubt it&#39;s much easier with ribbons
at every turn.&lt;/p&gt;
&lt;p&gt;Once we got through the first couple miles, though, things opened up and
the trail was pretty clear. We also started to see a lot more trail
markers (this section is both the Arizona Trail and the Highline Trail)
and so were pretty confident we were on the right track.
This lasted until about mile 22.5, when we ran right into a fence.&lt;/p&gt;
&lt;p&gt;Time: 6:39&lt;/p&gt;
&lt;h3 id=&quot;22.5-27%3A-things-start-to-go-wrong-%5B%2B764%2F-715%5D&quot;&gt;22.5-27: Things Start to Go Wrong [+764/-715] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#22.5-27%3A-things-start-to-go-wrong-%5B%2B764%2F-715%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As I said, things were going fine until about mile 22.5, when
we ran right into a barbed wire fence with a sign that said
something like &amp;quot;Caution: Burn Area&amp;quot; (sorry, no picture). This
wasn&#39;t entirely a surprise because I knew there had been
a fire, but I also wasn&#39;t expecting a fence. There wasn&#39;t
any obvious way through (though that&#39;s where the GPX track
and the apparent trail wanted to go), so we spent a while backtracking and
looking for alternate trails that would get
us around but didn&#39;t find anything. Ultimately, we just
concluded that this was actually a gate, unhooked the
wire hanging the piece of fence with the sign, and went
on through.&lt;/p&gt;
&lt;p&gt;Unfortunately, this was just the first of a series of sections
where we got lost. Much of the trail was badly overgrown
with knee-length grass, so we were reduced to following what looked
like the most trodden path through the grass and watching the
GPS (with occasional assists from the map on our phones)
to see if we were off track. Whenever that happened, we&#39;d
backtrack back to the point where we left the track and try
to find out what had gone wrong. Usually we could find some
faint track and we&#39;d follow that instead. Obviously, this
was super time consuming both in terms of how much it slowed
us down to be constantly watching the trail and then actual
backtracking. From here on in, there were also a lot of sharp
plants and so we both started to accumulate various scratches
(me more than Chris because he was wearing calf sleeves)&lt;/p&gt;
&lt;p&gt;We ended up generally following a dry creek bed stream, but kept
getting caught in one side trail or another. The confusing part is
that these were obviously real trails and they were shown on the
map. Eventually we realized that these were probably more reroutes and
that if we had just followed them we would have been fine, but we only
figured that out after we were mostly past this section.
Somewhere in here we almost totally lost the trail. We were in
the dry creek bed and could see that we were off course, but
it seemed like it was actually at the top of a small wall? cliff?
about 20 feet high. We ended up scrambling up it and were able
to find another section of what looked like trail, so we picked
that up.&lt;/p&gt;
&lt;p&gt;At this point I was starting to get pretty worried. We were clearly
making very slow time and it was going to get even harder to find
the trail in the dark (this proved to be true later), and given
how cold it had been in the morning I sure didn&#39;t want to be out
all night. We had packed a few extra layers (arm warmers, gloves,
rain jackets, and buffs for each of us, plus one extra long sleeve
shirt, one pair of tights, and one pair of rain pants, plus a couple
of emergency bivies) but none of that was going to make it fun
to be running in the dark in freezing cold weather. Looking at the
map, we found a small housing development around mile 32, so
we figured if we could make it there we could somehow get a ride
out, so we pushed forward, figuring in the worse case we could
backtrack to the trailhead around mile 17.&lt;/p&gt;
&lt;p&gt;Time: 9:00&lt;/p&gt;
&lt;h3 id=&quot;27-32%3A-relatively-smooth-sailing-%5B%2B709%2F-968%5D&quot;&gt;27-32: Relatively Smooth Sailing [+709/-968] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#27-32%3A-relatively-smooth-sailing-%5B%2B709%2F-968%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This next section was actually quite smooth. The trail was
generally pretty easy to find (there were even markings!)
and even runnable in some places.
Relatively early on we crossed a dirt road which we probably
could have bailed out on, but it would have require us to
run for quite a while on that road to get to somewhere
that we could have gotten a ride, so we decided to push on
to the original point we had identified.&lt;/p&gt;
&lt;p&gt;Of course, by the time we got there, it was also clear that we could
go further. The next obvious bailout point was the &lt;a href=&quot;https://www.stateparks.com/tonto_state_fish_hatchery_in_arizona.html&quot;&gt;Tonto Fish
Hatchery&lt;/a&gt;,
which was actually the turnaround for when I did ZG 50 back in
2019. The hatchery is just about 4 miles up the road from
Kohl&#39;s Ranch, so if we got there we could make it there under our own power
rather than having to get a ride from the middle of nowhere.
After sitting down and having some caffeine we decided
to push on.&lt;/p&gt;
&lt;p&gt;Time: 11:15&lt;/p&gt;
&lt;h3 id=&quot;32-38%3A-to-the-hatchery-%5B%2B804%2F-732%5D&quot;&gt;32-38: To the Hatchery [+804/-732] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#32-38%3A-to-the-hatchery-%5B%2B804%2F-732%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The next few miles to the hatchery were actually pretty
smooth. First we had to climb about 400 feet up and then it was generally
downhill, all of which was quite comfortable once the caffeine
kicked in. The trail actually intersects the road twice at the
hatchery and we opted for the second intersection because
it&#39;s more of a straight shot down to Kohl&#39;s Ranch.&lt;/p&gt;
&lt;p&gt;In retrospect this may have been a mistake because by this point we
were on headlamp and the trail suddenly got quite difficult to
find. Instead of being a bunch of overgrown grass it was just
bare rock with a bunch of cairns marking the way, so once
again we were reduced to watching the GPX track and then kind
of trying to find the trail from that and the cairns, not easy
to do in the dark. Anyway, we eventually found the
road (real road, not dirt road) and headed in.&lt;/p&gt;
&lt;p&gt;Time: 13:15&lt;/p&gt;
&lt;h3 id=&quot;38-42.5%3A-on-the-road-%5B%2B39%2F-965%5D&quot;&gt;38-42.5: On The Road [+39/-965] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#38-42.5%3A-on-the-road-%5B%2B39%2F-965%5D&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This last section was on asphalt and mostly downhill, so we
took it pretty fast (~8:30 moving pace, which is tiring
after 38 miles). There was obviously no real concern about
finishing at this point, so we just slogged it out and tried
to keep moving (with occasional breaks to obsessively check
that we were on the right road) until we got to our cabin.&lt;/p&gt;
&lt;p&gt;Time: 13:56&lt;/p&gt;
&lt;h2 id=&quot;now-what%3F&quot;&gt;Now what? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#now-what%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Of course, at this point we were stuck at Kohl&#39;s with one car
at 260 TH and one at Pine TH. It&#39;s not exactly easy to
get a Lyft or an Uber in the middle even from the hotel
(validating our previous decisions), but we
managed to convince one of the hotel staff to give us a
ride to 260 TH and then picked up the car and headed to
Pine, plus dinner, all of which got us back to the hotel
at ~10:00 PM.&lt;/p&gt;
&lt;h2 id=&quot;nutrition&quot;&gt;Nutrition &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#nutrition&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Overall nutrition went pretty well, though we didn&#39;t eat
anywhere near as much as I expected or brought.&lt;/p&gt;
&lt;p&gt;Our hotel room at Kohl&#39;s had a full kitchen so we were able to
make oatmeal in the morning; in the past I&#39;ve just had Tailwind
or an energy bar because I was worried about GI distress,
but I tried steel cut oats at SOB 100K and that went well,
so I went with oatmeal (regular this time) here and that
was also OK. I also drank about 2/3 of a bottle of Tailwind
on the way to the trailhead.&lt;/p&gt;
&lt;p&gt;Overall consumption:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Tailwind&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;8&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Powerbars&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;8&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Gels&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;M&amp;amp;Ms&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1 bag&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;0&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2400&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We were worried about water ahead of time but it actually
turned out to be fine and we were generally able to filter
out of creeks, drinking it directly or filtering into
bottles (Salomon XA Filter Cap FTW).&lt;/p&gt;
&lt;p&gt;This probably isn&#39;t really enough calories but remember
we were going quite slow, so you don&#39;t need as much
as if you were running the whole way.
I never bonked and didn&#39;t have any real GI distress at any point,
so this seems like a success.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/highline/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Obviously, this was much harder than I had expected. I did
ZG 50 in 12:01 a few years ago and after SOB 100K was hoping
to finish ZG 100K 2022 in the high 13:00s, which would have put
us around 11-12 for the shorter Highline trail (the distances
are a bit fuzzy). In the event, we
ran almost 14 for an abbreviated route. Most of that
can be chalked up to the difficulty of the trail, both
in terms of footing and in terms of routefinding. I expected
the footing to be bad, but I didn&#39;t expect it to be so overgrown,
which definitely slowed us down. However, I didn&#39;t expect to have
so much trouble finding the route.&lt;/p&gt;
&lt;p&gt;I remember the other part of the Highline Trail (remember, we bailed
out right about where I turned around in 2019) as being much easier to
find, but that may have just been that it was well marked
and cleaned up before the race. A lot of this was our fault
for trying to find the GPX track rather than looking at the
map to see where the trail really was. That would have saved
us some of our worst points of confusion, but we still would
have had to constantly double check every time we hit a junction,
so I&#39;m not sure how much time it would have saved at the end
of the day. It&#39;s definitely a lot easier when someone has
put ribbons out.&lt;/p&gt;
&lt;p&gt;As noted above, we brought too much food. I forgot to keep
records for &lt;a href=&quot;https://educatedguesswork.org/posts/tenaya-loop&quot;&gt;Tenaya&lt;/a&gt; though I know
we brought way much then too. Now I have a real benchmark
and I could probably have brought about 20% less and still
been fine. No reason to over-carry.&lt;/p&gt;
&lt;p&gt;The rest of our equipment choices seemed pretty reasonable.
We decided not to bring poles and they would have mostly been in the way. I was right at the limit
of my Salomon Advanced Skin Set 5 (in fact, it&#39;s now tearing
out at a seam) but it did OK. I made a last minute decision
to wear my Salomon Ultra 3s instead of my Sense 4 Pro because
they&#39;re a bit more stable. I think that was a mistake because
I like a slightly more precise shoe for this terrain--on the
other hand, Chris did it in Ultra Glides so probably not a big
deal; also
my socks started to slide down and the collar of the Ultra 3s
tends to rub a bit. I don&#39;t much like calf sleeves, but I
sort of regret not wearing them in this case. That way my
legs wouldn&#39;t have looked like this:&lt;/p&gt;
&lt;a href=&quot;https://educatedguesswork.org/img/highline-legs.jpg&quot;&gt;
&lt;img alt=&quot;My legs, all scratched up&quot; width=&quot;50%&quot; src=&quot;https://educatedguesswork.org/img/highline-legs.jpg&quot; /&gt;
&lt;/a&gt;
&lt;p&gt;Fitness wise, everything was fine. I was never too wiped
out and could easily have done the last 12 miles if I&#39;d
had to. The hardest part was actually the last 4 miles:
pushing the pace on the road was just pretty unpleasant.
Good mental practice, though.&lt;/p&gt;
&lt;p&gt;I&#39;m confident that we made the right decision to bail out at
Tonto. I think we probably could have made to to 260 TH OK,
but it would have been a long 12 miles (probably 3-4 hrs)
in the dark and if anything had gone wrong, we could have
been in real trouble. My general feeling is that the adventure
part of adventure runs is best contained to the risk of being
really miserable rather than the risk of life and limb.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Privacy for license plates</title>
		<link href="https://educatedguesswork.org/posts/license-plates/"/>
		<updated>2021-11-28T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/license-plates/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
};
&lt;/script&gt;
&lt;p&gt;Here at EG we spend a lot of time on privacy and obviously one
of the big concerns is avoiding people tracking you, whether
&lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/&quot;&gt;in&lt;/a&gt;
&lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/&quot;&gt;person&lt;/a&gt;
or &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/&quot;&gt;on the Internet&lt;/a&gt;.
From that perspective, I&#39;ve always found license plates
kind of anomalous. If it
was illegal to leave your house without wearing a label
with your social security number printed on it, we&#39;d all
recognize this as privacy-invasive--heck, I get upset when
I have to show my ID at the airport--but for some reason when
the identifier is bolted to your car people think that&#39;s
fine.&lt;/p&gt;
&lt;p&gt;I suspect a lot of what we&#39;re seeing here is just status quo
bias: license plates have been around for a long time and people
are used to them. But it&#39;s also true that there has been
a not-that-well-publicized change in how easy license plate-based
tracking is due to ubiquitous deployment of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Automatic_number-plate_recognition&amp;amp;oldid=1054872877&quot;&gt;Automatic Number-Plate Recognition (ANPR)&lt;/a&gt; technologies--essentially
cameras which record license plate numbers. For instance,
in 2016, London had &lt;a href=&quot;https://www.london.gov.uk/questions/2016/3107&quot;&gt;1666&lt;/a&gt;
ANPR cameras deployed and I expect that there are more now.
The result is that this enables an enormous amount of driver
surveillance.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
A natural question here is whether we can build
something that fulfills the legitimate purposes of license
plates while having better privacy properties.&lt;/p&gt;
&lt;p&gt;Most of the following is by way of a thought experiment: I don&#39;t
actually expect license plates to be replaced, but it&#39;s useful
practice in designing this kind of system to think about how
the privacy properties of the system and how to improve them
in a system this constrained.&lt;/p&gt;
&lt;h2 id=&quot;requirements%2Fconstraints&quot;&gt;Requirements/Constraints &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/license-plates/#requirements%2Fconstraints&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It seems that we have two primary functional requirements
for license plates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Identifying&lt;/em&gt; vehicles which are of interest for some reason.
For instance, we might observe a vehicle committing some kind
of violation and use the license plate to track down the owner.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Tracking&lt;/em&gt; vehicles which have been previously identified.
For instance, a vehicle might have been stolen and we want to
find it, or a suspect might be driving a given vehicle and
we want to be notified if it appears.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On the privacy side, we want it to be difficult to use them
for mass surveillance. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It should not be surreptitiously possible to determine a person&#39;s identity from
their license plate. In generally, the public should not
be able to do this at all, and law enforcement should
require some auditable process.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It should not be possible to use the license plate to follow
arbitrary vehicles for an extended period of time&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
However, we &lt;em&gt;do&lt;/em&gt; want it to be possible to track specific
vehicles of interest (e.g.,
driven by a suspect). Here too, the process for tracking
specific vehicles should be auditable.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can&#39;t significantly change the form factor of the license
plate: it has to fit in roughly the same location it does now,
be readable with the naked eye, and have a short enough number
(say &amp;lt;12 digits) that a human can read and remember it. On the
other hand, we aren&#39;t committed to it beng some kind of metal
plate. That&#39;s good because any static identifier is going to have bad tracking
properties. Realistically our new license plate is going to need
to change and so will need to be some kind of smart screen,
but I think it does have to work without being online
all the time which eliminates some designs.&lt;/p&gt;
&lt;h2 id=&quot;designs&quot;&gt;Designs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/license-plates/#designs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Instead of presenting a finished design, I&#39;m going to work my
way up to it, starting by designing something that won&#39;t really
work and then refining it into something that might. This helps get at the key ideas
but also is a useful demonstration of how to work through
a problem like this and the tradeoffs you have to make.&lt;/p&gt;
&lt;h3 id=&quot;an-infeasible-design&quot;&gt;An Infeasible Design &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/license-plates/#an-infeasible-design&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Let&#39;s start by relaxing the constraint that the plates have
to be consumable by humans. This gives us a system that&#39;s
a lot easier to design and let&#39;s us work out some of the problems;
then we can look at re-adding that constraint. We start by
taking two shortcuts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Identifiers which are infeasibly long.&lt;/li&gt;
&lt;li&gt;Identifiers which can be interpreted by machines rather
than humans.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We start by assigning each vehicle $i$ a unique
identifier $I_i$ which is associated with the vehicle
registration. $I_i$ is never displayed on the vehicle, however.
Instead, every so often (every minute?) a vehicle&#39;s plate changes
with vehicle $i$&#39;s current license plate at time $t$ denoted as
$P(i, t)$.&lt;/p&gt;
&lt;p&gt;As a first attempt, we can just use public key encryption.
Each jurisdiction $j$ (e.g., California)
has an asymmetric key pair $(K_j^{pub}, K_j^{priv})$
and the current plate number is the encryption of $I_i$.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
I.e.,&lt;/p&gt;
&lt;p&gt;$$P(i, t) = E(K_j^{pub}, I_i || t)$$&lt;/p&gt;
&lt;p&gt;This seems like it meets the rest of our requirements: if you don&#39;t
know $K_j^{priv}$ it&#39;s not possible to determine $I_i$ and you
can&#39;t link up $P(i, t)$ and $P(i, t&#39;)$. On the other hand, law
enforcement can use the private key to decrypt any given plate
and then can determine $I_i$, thus identifying vehicles and tracking
them as necessary. Auditability comes from restricting access
to the private key and we can put as much ceremony around that
access as we want. However, the problem is that this works badly
for tracking. &lt;em&gt;Every time&lt;/em&gt; you want to determine if a new vehicle
is the same as an old one you need to use the private key, which
means that it can&#39;t be that onerous to do the decryption, thus making
auditability more difficult.&lt;/p&gt;
&lt;p&gt;We can, however, improve the situation fairly easily by making
the plate generation algorithm somewhat more complicated. Instead
of just having it be the public key encryption of the identifier,
we add a pseudorandom value unique for each vehicle. To make
this work, each vehicle $i$ creates a random key $L_i$. This value
is not known to the authorities. It then generates its plate
number using the following three values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The encryption of the identifier $I_i$&lt;/li&gt;
&lt;li&gt;The encryption of the linkage key $L_i$&lt;/li&gt;
&lt;li&gt;A pseudorandom value based on $L_i$&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I.e.,&lt;/p&gt;
&lt;p&gt;$$P(i, t) = [E(K_j^{pub}, I_i || t), E(K_j^{pub}, L_i), PRF(L_i, t)]$$&lt;/p&gt;
&lt;p&gt;In this case, the values are all fixed length so you can just
concatenate them and the receiver can sort it out. If
they were variable length, you would need some kind
of separator or length prefixed encoding or something.&lt;/p&gt;
&lt;p&gt;The way this gets used in practice is that when you see a new
plate number you want to identify, you decrypt $P(i, t)$ to
recover $I_i$ as before. This is only sufficient to identify
the vehicle. However, if you also want to &lt;em&gt;track&lt;/em&gt; it, you also
decrypt $L_i$, which you can use to predict the last piece of
$P(i, t)$ (i.e., $PRF(L_i, t)$ just by computing the pseudorandom function.
This wasn&#39;t possible before because $L_i$ was secret to the vehicle,
but once you know $L_i$ it&#39;s straightforward.
This allows you to separate the functions of identification from
tracking, but you only need to use the private key at
most twice per vehicle (once to identity it and once to track it)
which allows for tight audit control.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Once you have recovered $L_i$ you can then track the vehicle
indefinitely by predicting the PRF for time $t$.&lt;/p&gt;
&lt;p&gt;This design is cryptographically straightforward and has the desired
privacy and functional properties. The only problem is that it&#39;s not
usable by humans: it requires identifiers which are too large to
memorize or transcribe and you need a computer to predict the
future identifiers. Maybe when we&#39;re all wearing
Apple&#39;s &lt;a href=&quot;https://www.macrumors.com/2021/11/25/kuo-apple-ar-headset-mac-level-computing/&quot;&gt;AR headset&lt;/a&gt;,
it can automatically process these smart plates, but
let&#39;s see if we can do better in the meantime.&lt;/p&gt;
&lt;h3 id=&quot;a-more-usable-design&quot;&gt;A more usable design &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/license-plates/#a-more-usable-design&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If we&#39;re going to get down to human scale identifiers, we&#39;re going to
need to jettison sending a public key encrypted value because it inherently
requires values that are too long for humans to memorize. This
limits the design space quite a bit because we have to be able to
generate a sequence $P(i, t)$ that has the following properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It&#39;s known to the vehicle (user)&lt;/li&gt;
&lt;li&gt;It can be generated by the authorities once the vehicle is
identified.&lt;/li&gt;
&lt;li&gt;It &lt;em&gt;cannot&lt;/em&gt; be generated by third parties.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One obvious thing to do is to just replace the asymmetric key pair
with a symmetric key and use &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Format-preserving_encryption&amp;amp;oldid=1054799181&quot;&gt;format preserving
encryption&lt;/a&gt;
to keep the ciphertext small. The problem is that the encryption
key then needs to
be known by every user or at least every smart license plate, and
so it&#39;s possible to extract the key and track other users, thus
violating property (3). I don&#39;t know of any design that
involved &lt;em&gt;just&lt;/em&gt; encrypting $I_i$ or any derivative. Instead
the best I can do requires the authorities to compute each candidate
$P(i, t)$. For instance, suppose we say:&lt;/p&gt;
&lt;p&gt;$$P(i, t) = Truncate(PRF(L_i, t))$$&lt;/p&gt;
&lt;p&gt;Assuming that the authorities know all values of $L_i$ (note that this
is a change from the previous design where they do not), they can just
compute all potential values of $P(i, t)$ for any time window and look
up the value of interest. This is the kind of computation that would ordinarily be
impractical on a cryptographic scale, but any jurisdiction will probably have only
a few million vehicles (California has about &lt;a href=&quot;https://www.statista.com/statistics/196010/total-number-of-registered-automobiles-in-the-us-by-state/&quot;&gt;30 million registered
cars&lt;/a&gt;),
and doing $10^8$ PRF executions is quite cheap. This is especially
true because the time windows need to be much longer in order for
this to be useful. For instance, if we want to tell people &amp;quot;be on the
lookout for plate number 12345&amp;quot; that number probably has to be valid
for at least a day or two.&lt;/p&gt;
&lt;p&gt;It&#39;s a little unfortunate to have the authorities exhaustively
search all $P(i, t)$ values, both because it&#39;s expensive and because
it requires the $L_i$ database to be online. However, we can do better
by precomputation. The way this works is that for each time window
$t$ you precompute the encryption table from our previous design. I.e.,&lt;/p&gt;
&lt;p&gt;$$[E(K_j^{pub}, I_i || t), E(K_j^{pub}, L_i), P(i, t)]$$&lt;/p&gt;
&lt;p&gt;for each vehicle and then you store the table (this is on the order
of a few gigabytes of data a day). You then use the transmitted
$P(i, t)$ value to look up the right database entry and then $K_j^{priv}$ to decrypt
$I_i$. As before, you can now identify the vehicle.
If you want to track the vehicle, you also decrypt $L_i$.
From then you can run the PRF forward to compute the sequence
of values.&lt;/p&gt;
&lt;p&gt;It&#39;s important that after the authorities build the table they
discard the $L_i$ values and then shuffle the table so&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
it&#39;s not possible to know which entries correspond to each
other across time windows.
If you do this, then the table itself isn&#39;t enough to link up multiple plate values&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.
Instead, you need the private key, which gives us the same properties
as before.&lt;/p&gt;
&lt;p&gt;Taking a step back, this is almost the same as our pure public
key system, except that we&#39;ve replaced public key encryption on the vehicle with precomputed
public key encryption by the authorities and now use the
(truncated) pseudorandom sequence as a lookup key.&lt;/p&gt;
&lt;h3 id=&quot;collisions&quot;&gt;Collisions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/license-plates/#collisions&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One somewhat unfortunate property of this design is that it is
susceptible to &lt;em&gt;collisions&lt;/em&gt;. If we generate a large number of
random values out of a smallish space, we will get repeats
(see &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Birthday_problem&amp;amp;oldid=1055046729&quot;&gt;birthday paradox&lt;/a&gt;).
If you have $V$ vehicles, you need more than $V^2$ possible
plates to keep the chance of collisions low.
In a state like California, this means something like
$2^{60}$ possible value. Each digit of the plate encodes about 5 bits
so that means 12 digits, which is far more than any
current plate scheme.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s possible to do quite a bit better by permuting rather
than computing a pseudo random value. For instance, we could
assign each vehicle a short identifier $S_i$ and then compute&lt;/p&gt;
&lt;p&gt;$$P(i, t) = Encrypt(K_{permute}, S_i)$$&lt;/p&gt;
&lt;p&gt;where Encrypt is some sort of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Format-preserving_encryption&amp;amp;oldid=1054799181&quot;&gt;format preserving
encryption&lt;/a&gt;
scheme. This will provide unique values with a much smaller number
space, but at a cost. For the reasons mentioned above, the vehicle
can&#39;t know $K_{permute}$, so the authorities need to precompute all
$P(i, t)$ values and distribute them to each vehicle. This has a bunch
of logistical difficulties (e.g., $K_{permute}$ needs to be online in
order to issue the plates).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
Moreover, it seems likely that the system
needs to work with only partial plates, in which case you&#39;ll get
collisions in the plate values anyway.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;
Anyway, this is not clearly better, but instead reflects the kind
of design tradeoff that you have to make when building something
under these constrained circumstances.&lt;/p&gt;
&lt;h2 id=&quot;attacks&quot;&gt;Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/license-plates/#attacks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s worth noting that this system is obviously subject to a variety
of attacks. In particular, nothing stops me from making a fake smart
plate that shows random identifiers. However, not much stops me from
doing that &lt;em&gt;now&lt;/em&gt;, either with the low tech way of having multiple
real plates that I swap (ever see &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=The_Transporter&amp;amp;oldid=1044192009&quot;&gt;The Transporter&lt;/a&gt;?) or a smart plate that looks like a real plate;
I&#39;m pretty sure I could make one of these that looks pretty real,
especially behind a license plate holder. The basic premise here is
that people aren&#39;t working too hard to cheat the system.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/license-plates/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said at the beginning, I&#39;m under no illusions that we&#39;re going
to replace our current license plate system with a system of
smart plates. However, it&#39;s still important to look at the systems
we&#39;ve built and see how/whether they can be used and how they can be improved.
In this case, we have a system which originally had so-so privacy
properties and due to modern technology in the form of ANPR now has quite bad ones.
When designing new systems we need to be careful not to reproduce
this situation in the future.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update: 2021-11-29&lt;/strong&gt;: Added some clarifying material around the initial
construction and also fixed some holdover text where I assumed that
the plate had only two components, not three.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This pattern, in which technology turns a theoretical
privacy violation into a practical one, has become
quite common. See also DNA-based forensics and
facial recognition. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
We would typically formalize this as saying that it
is not possible to do significantly better than guessing
in distinguishing seeing the same vehicle twice
from seeing two similar vehicles. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s best if you have some kind of randomized
encryption like &lt;a href=&quot;https://www.ietf.org/archive/id/draft-irtf-cfrg-hpke-07.html&quot;&gt;HPKE&lt;/a&gt;
so that attacker&#39;s who somehow learns $I_i$ can&#39;t trial encrypt. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You actually probably want to have two key pairs so
that you can separately audit their use and to prevent
attacks where decrypting $L_i$ is
confused with decrypting $I_i$. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For instance by sorting them by the encrypted $I_i$ values. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;More nerdsniping, there&#39;s probably some multiparty
computation way of constructing the table so that it&#39;s never
linkable. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, I know that plate numbers are structured, which
reduces the space further. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;We can still precompute the table I
mentioned in the previous section, as that provides better privacy
on the lookup side. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though we could probably use some of the digits we&#39;ve saved
to add some sort of error correcting code. &lt;a href=&quot;https://educatedguesswork.org/posts/license-plates/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>A quick look at the New Zealand Vaccine Pass</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport-nz/"/>
		<updated>2021-11-23T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport-nz/</id>
		<content type="html">&lt;p&gt;A reader alerted me to New Zealand&#39;s &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/%5Bhttps://covid19.govt.nz/covid-19-vaccines/covid-19-vaccination-certificates/my-vaccine-pass/#how-to-get-my-vaccine-pass&quot;&gt;vaccine pass&lt;/a&gt;
system (&lt;a href=&quot;https://nzcp.covid19.health.nz/&quot;&gt;spec here&lt;/a&gt;). Like the other vaccine passport systems I&#39;ve seen
(&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/&quot;&gt;New York&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca/&quot;&gt;California&lt;/a&gt;,
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/&quot;&gt;EU&lt;/a&gt;),
it&#39;s a digitally signed credential, but (of course) it&#39;s also
slightly different and so incompatible.
In this case, it&#39;s a
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8392&quot;&gt;CBOR Web Token (CWT)&lt;/a&gt;.
The NZ system is straight CBOR and encodes data in Base32
without any compression. They &lt;a href=&quot;https://nzcp.covid19.health.nz/#2d-barcode-encoding-options-rational&quot;&gt;argue&lt;/a&gt; that this is better than
the alternatives for some implementation and interoperability reasons.&lt;/p&gt;
&lt;p&gt;Here&#39;s a look at their example credential converted to JSON:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;iss&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;did:web:example.nz&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;nbf&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1516239022&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;exp&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1516239922&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;jti&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;urn:uuid:cc599d04-0d51-4f7e-8ef5-d7b5f8461c5f&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;vc&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;@context&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://www.w3.org/2018/credentials/v1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://nzcp.covid19.health.nz/contexts/v1&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;version&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1.0.0&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;VerifiableCredential&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;PublicCovidPass&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;credentialSubject&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;givenName&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;John Andrew&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;familyName&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Doe&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;dob&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1979-04-14&quot;&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&quot;information-in-the-pass&quot;&gt;Information In the Pass &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#information-in-the-pass&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first thing to note here is that the only real user-specific
information here is in the &lt;code&gt;credentialSubject&lt;/code&gt; field, which
just contains the vaccinated person&#39;s identity. This is different
from the other systems I&#39;ve looked at which contain information
about the person&#39;s medical status, such as when they were vaccinated,
recovered from COVID, or had a negative test. I can understand
why you might want a more limited credential for privacy reasons,
but this seems like an unfortunate piece of inflexibility.&lt;/p&gt;
&lt;p&gt;This is especially true now that we know that vaccine effectiveness
&lt;a href=&quot;https://twitter.com/PaulMainwood/status/1461374201474998275/photo/1&quot;&gt;wanes quite a bit over time&lt;/a&gt;.
Imagine you want to require that someone be either recently
vaccinated or had a recent booster; unlike other systems
this credential doesnt allow the verifier to determine that directly.
There &lt;em&gt;is&lt;/em&gt; an expiration date (more on this &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#expiration&quot;&gt;below&lt;/a&gt;,
which you could sort of use for this purpose by having any
pass expire in six months (the site says that&#39;s how long
they are good for), but there are several problems with using
it that way.&lt;/p&gt;
&lt;p&gt;First, we don&#39;t yet know how long vaccination will be effective or
what future requirements will be. For instance, there has
been some speculation that boosters will confer persistent
immunity past 6 months, in which case you might say that
&amp;quot;fully vaccinated&amp;quot; meant &amp;quot;within 6 months of the initial
dose(s) or after the booster&amp;quot;. There&#39;s no way to represent
this with a single &amp;quot;expiration&amp;quot; number, and so you could
easily create a situation where you issue passes to people
today that are either too short or too long.&lt;/p&gt;
&lt;p&gt;Second, it&#39;s not clear that people will get their passes
immediately after being vaccinated. If they don&#39;t, the issuer then
has to decide between having the pass last for six
months from the time of issuance or six months after they were vaccinated.
the first option doesn&#39;t work well if you want the pass
to reflect the duration of immunity and the second is
going to result in people having passes with varying
durations, as well as being kind of a record-keeping hassle
if you ever have to revoke passes (one of the purposes
of expiration is to allow you not to &lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak&quot;&gt;revoke&lt;/a&gt;
credentials which have already expired). Moreover, if you
have the pass expire six months from being vaccinated,
you&#39;ve just leaked the vaccination date, which is presumably
what the system designers were trying to avoid by omitting that information
from the pass.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Effectively, this design just encodes all the policy decisions
in the decision to issue the pass, at which point they are
fixed for any individual pass and can&#39;t be addressed
except by issuing new passes. Obviously, you &lt;em&gt;can&lt;/em&gt; issue new passes,
credentials, but for the reasons I&#39;ve talked about &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport&quot;&gt;before&lt;/a&gt;,
this can be quite inconvenient.
It&#39;s not clear to me if New Zealand has built
a system to let people automatically update their vaccine
pass, but based on their site, it looks like it&#39;s just a
QR code that you print out or add to Apple Wallet, so presumably
not. And of course, one of the big advantages of these signed
QR codes is that you can just print them out, and you can&#39;t
automatically update paper. And of course, if the policy gets
&lt;em&gt;more&lt;/em&gt; restrictive, then you might be faced with mass revoking
a lot of passes.
It seems like it would be better
to just put the right information in the pass upfront to
allow verifiers to enforce policy (and to be updated with
new policies), rather than requiring people to get new passes
when policy changes.&lt;/p&gt;
&lt;h2 id=&quot;expiration&quot;&gt;Expiration &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#expiration&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The actual intended value of the exiration date (&lt;code&gt;exp&lt;/code&gt; field) is
quite confusing. Here&#39;s what the spec says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Expiry, this claim represents the datetime at which the pass is
considered expired by the party who issued it, this claim MUST be
present and its value MUST be a timestamp encoded as an integer in
the NumericDate format (as specified in [RFC8392] section
2). Verifying parties MUST validate that the current datetime is
before the value of this claim and if not they MUST reject the pass
as being expired. This claim is mapped to the Credential Expiration
Date property in the W3C VC standard. The claim key for exp of 4
MUST be used.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Given that the passes are supposed to expire after six months,
we might expect that this will be six months from the issuance
date, but the example credential above appears to
expire in 15 minutes)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
On the other hand, the &lt;a href=&quot;https://nzcp.covid19.health.nz/#valid-worked-example&quot;&gt;&amp;quot;valid worked example&amp;quot;&lt;/a&gt;
in the specification has the following dates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;not before: 1635883530 (2021-11-02T20:05:30.000Z)&lt;/li&gt;
&lt;li&gt;not after: 1951416330 (2031-11-02T20:05:30.000Z)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, this pass is valid for 10 years.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
This leaves me with no idea about what is going to appear in
the expiry field of a real passport.) In a design which had
the vaccination date in the pass, then the purpose of the expiry
field is basically to limit the period during which you have
to care about a given credential. For instance, if you issue
a bunch of credentials with a key and lifetime of one year
and then two years later that key is compromised, you don&#39;t
need to do anything because the credentials are already invalid.
Having credentials expire also makes updating easier because
it gives you a hard lifetime on support for
old credential, thus making it somewhat easier to update
to a new format you know that after a certain time
all credentials will be new. With that said, 10 years is a very long time;
by contrast WebPKI certificates must have a lifetime of no longer
than 398 days. Given the maturity of these systems, 1-2 years
seems more appropriate.&lt;/p&gt;
&lt;p&gt;If anyone has a copy of a valid NZ pass, I&#39;d be interested to
see when it expires.&lt;/p&gt;
&lt;h2 id=&quot;uuid&quot;&gt;UUID &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#uuid&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Second, the pass contains a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Universally_unique_identifier&amp;amp;oldid=1055904098&quot;&gt;universally unique identifier (UUID)&lt;/a&gt;. They
recommend that it be a &amp;quot;version 4&amp;quot; UUID, which just means
that it&#39;s randomly generated. It&#39;s not clear to me why this
is needed: it&#39;s not required by the CWT specification and
you can make a unique id from the pass just by hashing it,
which (statistically) guarantees uniqueness as long as the contents are unique.
I don&#39;t think this is harmful, but it&#39;s also not clear to
me what it&#39;s for.&lt;/p&gt;
&lt;h2 id=&quot;keys&quot;&gt;Keys &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#keys&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As with the California system, the pass indicates which signing
keys were used but that must be checked against a list which
is statically configured into the application. The pass
carries this information with a &lt;a href=&quot;https://w3c-ccg.github.io/did-method-web/&quot;&gt;did:web&lt;/a&gt;
URIs, which must be on the following list:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string&quot;&gt;&quot;did:web:nzcp.identity.health.nz&quot;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;did:web is a specific binding (&amp;quot;method&amp;quot;) of the W3C &lt;a href=&quot;https://www.w3.org/TR/did-core/&quot;&gt;Decentralized
Identifier (DID)&lt;/a&gt; specification.
The way to read this is that the key file (formatted as a DID)
is located at &lt;a href=&quot;https://nzcp.identity.health.nz/.well-known/did.json&quot;&gt;https://nzcp.identity.health.nz/.well-known/did.json&lt;/a&gt;, the current contents of which are:&lt;/p&gt;
&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;did:web:nzcp.identity.health.nz&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;@context&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string&quot;&gt;&quot;https://w3.org/ns/did/v1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string&quot;&gt;&quot;https://w3id.org/security/suites/jws-2020/v1&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;verificationMethod&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;did:web:nzcp.identity.health.nz#z12Kf7UQ&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token property&quot;&gt;&quot;controller&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;did:web:nzcp.identity.health.nz&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;JsonWebKey2020&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token property&quot;&gt;&quot;publicKeyJwk&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token property&quot;&gt;&quot;kty&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;EC&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token property&quot;&gt;&quot;crv&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;P-256&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token property&quot;&gt;&quot;x&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;DQCKJusqMsT0u7CjpmhjVGkHln3A3fS-ayeH4Nu52tc&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token property&quot;&gt;&quot;y&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;lxgWzsLtVI8fqZmTPPo9nZ-kzGs7w7XO8-rUU68OxmI&quot;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token property&quot;&gt;&quot;assertionMethod&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string&quot;&gt;&quot;did:web:nzcp.identity.health.nz#z12Kf7UQ&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is pulling in a whole lot of specification machinery,
but at the end of the day what it means is that there is one valid
signing key (&lt;code&gt;z12Kf7UQ&lt;/code&gt;), for ECDSA with the P-256 curve. Importantly, this structure &lt;em&gt;does&lt;/em&gt; support multiple keys, for
instance for key rollover. The DID document can contain more than one
key, each with a different key id, and the signed contains a key id
(&lt;code&gt;kid&lt;/code&gt;) which identifies the key used to sign it.  This lets you
introduce a new key or a new algorithm, as with the
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki&quot;&gt;VCI&lt;/a&gt; and the &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu&quot;&gt;EU Green Card&lt;/a&gt;,
which is an important piece of future flexibility,&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I mentioned earlier, this design is conceptually really similar to
other vaccine passport systems. Most the choices they have made
(Base32 encoding, no compression, all-CBOR, using DIDs for the keys) seem reasonable,
even if they differ from other systems or you or I might have made different
ones. The one decision that seems really worse is to just have the
user&#39;s identity and omit their vaccination details and just have their identity.
The specification doesn&#39;t provide any rationale for this, but as described
above, it seems clearly less flexible.&lt;/p&gt;
&lt;p&gt;More generally, it&#39;s kind of disappointing to see all these different
vaccine passport systems be subtly different instead of everyone
converging on a common specification. I&#39;m not saying that the encoding
doesn&#39;t matter at all, but it seems like it would be better if we instead
had a single system (although potentially with disjoint
keys). This would let us focus engineering effort and analysis on
that one system, as well as providing the opportunity for interoperability
between credentials issued by different jurisdictions.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that you could also backdate the &amp;quot;not before&amp;quot;
to when someone was vaccinated, but that has the same
privacy issue. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The &amp;quot;not before&amp;quot; (&lt;code&gt;nbf&lt;/code&gt;)
value shown above is actually back in 2018 (2018-01-18T01:30:22.000Z),
which suggests this example was constructed by hand. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Incidentally, it&#39;s quite bad to have the human
readable and machine readable expiration dates
mismatch, so if the printed expiration is six months
and the pass says 10 years, that&#39;s not good. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nz/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Privacy Preserving Measurement 5: Randomization</title>
		<link href="https://educatedguesswork.org/posts/ppm-randomness/"/>
		<updated>2021-11-05T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ppm-randomness/</id>
		<content type="html">&lt;script&gt;
window.MathJax = {
  tex: {
    inlineMath: [[&#39;$&#39;, &#39;$&#39;], [&#39;&#92;&#92;(&#39;, &#39;&#92;&#92;)&#39;]]
  }
  }
&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
};
&lt;/script&gt;
&lt;p&gt;This is part V of my series on Privacy Preserving Measurement (see
parts &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro&quot;&gt;I&lt;/a&gt;, &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies&quot;&gt;II&lt;/a&gt;, and
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio&quot;&gt;III&lt;/a&gt;, &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-heavy-hitters&quot;&gt;IV&lt;/a&gt;).
Today we&#39;ll be addressing techniques that use randomization
to provide privacy.&lt;/p&gt;
&lt;p&gt;The aggregate measurement techniques I have described so far provide
exact answers (which is good) but require multiple servers in
which you have to trust at least one to behave properly (which
is less good). What if you want to collect a measurement but your
subjects are unwilling to trust you--or anyone else--at all. It&#39;s
still possible to collect some aggregate measurements using
what&#39;s called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Randomized_response&amp;amp;oldid=1024956231&quot;&gt;randomized response&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;basic-randomized-response&quot;&gt;Basic Randomized Response &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#basic-randomized-response&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Imagine you want to collect the rate at which people engage in some
behavior that they don&#39;t want people to know about or is illegal, such
as using heroin. For obvious reasons, people might not be excited
about that kind of admission, no matter what security precautions are
used for data collection. Randomized Response offers a solution
to this problem without any fancy cryptography.&lt;/p&gt;
&lt;p&gt;The basic idea is simple and goes back to 1965. Instead of just answering the question,
you generate a random number (e.g., by flipping a coin). Then you
respond as follows:&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;th colspan=&quot;2&quot;&gt;&lt;b&gt;Coin&lt;/b&gt;&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;Real Answer&lt;/b&gt;&lt;/td&gt;&lt;td&gt;Heads&lt;/td&gt;&lt;td&gt;Tails&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;If we assume that the fraction of the population who would have
answered &amp;quot;Yes&amp;quot; to the basic question is $X$ then the fraction of
people who answer &amp;quot;Yes&amp;quot; will be $(1 + X)/2$. So if the fraction of
&amp;quot;Yes&amp;quot; answers is $Y$ then it&#39;s easy to estimate $X$ by computing
$X = 2Y - 1$.
Note that this answer is approximate, not exact, because
the coin won&#39;t come up Heads exactly half the time. If
it comes up Heads more often than half the time, this will
cause you to overestimate $X$ and if it comes up less than half
this will lead you to underestimate $X$.
The
rate at which it does is given by the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Binomial_distribution&amp;amp;oldid=1050191349&quot;&gt;binomial distribution&lt;/a&gt;, but
the gist is that, as with most sampling techniques,
you get increased accuracy with more samples.&lt;/p&gt;
&lt;p&gt;Randomized Response provides plausible deniability because a lot of
the &amp;quot;Yes&amp;quot; answers are from people whose coin came up &amp;quot;Heads&amp;quot;.  If only a
small fraction of people would have answered Yes, then the vast
majority of people who say &amp;quot;Yes&amp;quot; actually just had a coin which came
up heads. Note that this doesn&#39;t give zero information:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Anyone who answer &amp;quot;No&amp;quot; really would have answered &amp;quot;No&amp;quot;.&lt;/li&gt;
&lt;li&gt;You can estimate the probability that a subject&#39;s true
answer is &amp;quot;Yes&amp;quot; because approximately $X/(X + 1/2)$ of
subjects would have answered &amp;quot;Yes&amp;quot;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In addition, if you take multiple independent measurements of the
same value from the same user, randomized response starts to leak information.
For instance, consider what happens you ask someone to use
randomized response to answer some question every month for a year. The chance
of a random coin coming up heads 12 times in a row is $1/4096$, so
if you get 12 &amp;quot;Yes&amp;quot; responses in a row, then it is much more likely
that the true answer is &amp;quot;Yes&amp;quot;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;background%2Fdigression%3A-bloom-filters&quot;&gt;Background/Digression: Bloom Filters &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#background%2Fdigression%3A-bloom-filters&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A common problem in computer science is representing a list of &amp;quot;known&amp;quot; values
so that (1) the stored list is small and (2) it&#39;s fast to look up whether
a given value is in a list.
For instance, suppose that I want a Web browser to filter out malicious
domains as in &lt;a href=&quot;https://safebrowsing.google.com/&quot;&gt;Google Safe Browsing&lt;/a&gt;
or revoked Web site certificates.
The natural solutions (lists, hash tables, etc.)
require actually storing the strings in the data structure, but this
is wasteful because I don&#39;t actually want to retrieve the strings, I
just want to check for presence or absence,
so why should I have to store all of that stuff? This suggests a simpler
solution: instead of storing the &lt;em&gt;value&lt;/em&gt; of the string in the hash
table, just store a single bit representing whether the string with a
given hash is present or not. This data structure is called
a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Bloom_filter&amp;amp;oldid=1048208233&quot;&gt;Bloom
filter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Bloom filters are obviously quite a bit smaller, but comes with a disadvantage:
false positives. Suppose we have two domains--one innocuous and one
illicit and blocked--which hash to the same value. In this case, the innocuous
domain will &lt;em&gt;also&lt;/em&gt; be blocked. In general, the false positive rate
of a single hash data structure will be &lt;em&gt;2&lt;sup&gt;-b&lt;/sup&gt;&lt;/em&gt; where &lt;em&gt;b&lt;/em&gt;
is the number of bits. You can improve the situation somewhat by
using multiple hash functions (this is how Bloom filters are typically
used); in this case, the string is present
in the filter if &lt;em&gt;all&lt;/em&gt; of the corresponding bits for each hash are
set to 1. However, there will still be false positives and any
use of Bloom filters needs to deal with this somehow.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;rappor&quot;&gt;RAPPOR &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#rappor&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Just as with with &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio&quot;&gt;Prio&lt;/a&gt;, basic randomized
response is not good for reporting arbitrary strings:
each string must be separately encoded, which gets out of hand
quickly, and it doesn&#39;t work at all for unknown string without
consuming impractical amounts of space.  Bittau et al. described
&lt;a href=&quot;https://www.chromium.org/developers/design-documents/rappor&quot;&gt;Randomized Aggregatable Privacy Preserving Ordinal Responses
(RAPPOR)&lt;/a&gt;
which attempts to address this problem, as well as the problem of
repeated measurements.&lt;/p&gt;
&lt;p&gt;As you have probably guessed from the previous section, RAPPOR uses
Bloom filters to store strings. The basic idea is straightforward: you
take the string you want to report and insert it into a Bloom filter.
and then send the filter to the server. You then randomly flip
some bits to add noise. Because you &lt;em&gt;already&lt;/em&gt; are randomly
adding values false positives from the Bloom filter aren&#39;t as big an
issue. This lets you send string values in finite space because
the Bloom filter is finite size no matter how big the strings are.&lt;/p&gt;
&lt;p&gt;Reading the data out of the Bloom filter isn&#39;t totally straightforward.
There are two problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A given set of Bloom filters is consistent with more than one set of
candidate input strings&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You have to already know the input strings. This requirement is
slightly subtle because you don&#39;t need to know the strings &lt;em&gt;in
advance&lt;/em&gt; but only when you want to query the results. For instance,
suppose that you deploy RAPPOR to measure the most popular home
pages and then you somehow later learn a new home page you didn&#39;t
know about via some other mechanism, you can use RAPPOR to find out
whether it&#39;s popular. But unlike with the techniques described in
part &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-heavy-hitters&quot;&gt;IV&lt;/a&gt; you can&#39;t &lt;em&gt;learn&lt;/em&gt; the home page
from RAPPOR because the Bloom filter doesn&#39;t store the values
(remember, that&#39;s how you get it small).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Anyway, once you have a set of candidate strings, you can use some
fancy statistics (including &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Lasso_(statistics)&amp;amp;oldid=1045995938&quot;&gt;LASSO&lt;/a&gt;) to estimate the most likely values.
RAPPOR also tries to address the problem of repeated querying by
using two layers of randomization. The first layer is stable,
so that multiple reports provide the same answer. The second layer
changes with every report so that you can&#39;t use the precise reports
as a tracking vector. Together they provide privacy for the user&#39;s
data.&lt;/p&gt;
&lt;p&gt;The big problem with RAPPOR is that it&#39;s very inefficient. As some
of the same authors write in their paper introducing &lt;a href=&quot;https://research.google/pubs/pub46411/&quot;&gt;PROCHLO&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Regrettably, there are strict limits to the utility of locally
differentially-private analyses. Because each reporting individual
performs independent coin flips, any analysis results are perturbed by
noise induced by the properties of the binomial distribution. The
magnitude of this random Gaussian noise can be very large: even in the
theoretical best case, its standard deviation grows in proportion to
the square root of the report count, and the noise is in practice
higher by an order of magnitude [7, 28–30, 74]. Thus, if a billion
individuals’ reports are analyzed, then a common signal from even up
to a million reports may be missed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Unfortunately, this is a generic problem with any technique where
users randomize their data before collection: there&#39;s a tradeoff between accuracy and privacy and
none of the points on the curve are particularly great. For
this reason, interest in these techniques has waned in favor of
non-randomized techniques like those described in parts
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies&quot;&gt;II&lt;/a&gt;, and
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio&quot;&gt;III&lt;/a&gt;, and &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-heavy-hitters&quot;&gt;IV&lt;/a&gt;
of this series.&lt;/p&gt;
&lt;h2 id=&quot;publishing-aggregate-data&quot;&gt;Publishing Aggregate Data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#publishing-aggregate-data&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There is, however, a situation in which randomization is very useful:
when used in combination with some exact aggregate data collection
mechanism, whether conventional or privacy preserving. As discussed at
the very start of this series, what you want out of these systems is
usually to measure aggregate data rather than individual, but even
aggregate data can reveal information about individuals.&lt;/p&gt;
&lt;p&gt;For instance, suppose that we are collecting household income from
everyone in a given neighborhood and publishing the number of people
in each 10,000/year bracket. This seems like it&#39;s fine, but what
happens if there&#39;s one person with income of 1,000,000/year and everyone else
has average income. In that case, the aggregate will immediately give
you a fairly close approximation of the rich person&#39;s income because
it&#39;s the only one in the 1,000,000 bracket. As you probably expect,
the fix here is to add random noise to the aggregate values before
you publish them. The precise details of how to do this are somewhat
complicated and depend on the structure of the data, how many different
slices you are going to publish, etc. Similar techniques have been
proposed to address the multiple querying problem described in
&lt;a href=&quot;https://educatedguesswork.org/ppm-prio/#repeated-queries&quot;&gt;part III&lt;/a&gt;, though the details are
presently somewhat fuzzy.&lt;/p&gt;
&lt;h2 id=&quot;what-about-differential-privacy%3F&quot;&gt;What about Differential Privacy? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#what-about-differential-privacy%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You may have heard the term &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Differential_privacy&amp;amp;oldid=1052600771&quot;&gt;differential privacy
(DP)&lt;/a&gt;.
Technically speaking, DP is a &lt;em&gt;definition&lt;/em&gt; for privacy. The idea here
is that you have some database which you let people query and you want
to limit the amount of information that the querier can learn about
individuals. The idea is sort of a generalization of randomized
response: Instead of providing an exact answer, you provide a
randomized answer structured so that that so that the distribution of
responses is similar regardless of whether a given individual&#39;s
information is included in the database or not.&lt;/p&gt;
&lt;p&gt;Formally, this is &lt;a href=&quot;https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/dwork.pdf&quot;&gt;defined&lt;/a&gt; by Cynthia Dwork as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Definition 2. A randomized function $&#92;mathcal{K}$ gives $&#92;epsilon$-differential privacy if for all data sets $D_1$ and $D_2$ differing on at most one element, and all $S &#92;subseteq Range(&#92;mathcal{K})$,&lt;/p&gt;
&lt;p&gt;$Pr[&#92;mathcal{K}(D_1) &#92;in S] ≤ exp(&#92;epsilon) × Pr[&#92;mathcal{K}(D_2) &#92;in S]$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What this math translates to is that any query produces an answer
that is based on the database but also randomized over a given
distribution. The chance of any given answer is roughly (to within
a factor of $&#92;epsilon$ the same whether a given person&#39;s
information is in the database or not. This is the same
intuition as randomized response: the aggregate result
is broadly accurate, but any individual response doesn&#39;t
have much impact on the result.&lt;/p&gt;
&lt;p&gt;In order to implement DP in practice you choose a privacy value $&#92;epsilon$
and then tune the amount of randomness you add in order to provide that
value. Each query consumes a certain amount of the budget available
and at some point you refuse to allow any more queries (the simple
case here is when you publish the database once, in which case you
just model this as one query). Actually determining how much randomness
to add and how is non-trivial, but the original theory comes from
a &lt;a href=&quot;https://doi.org/10.29012%2Fjpc.v7i3.405&quot;&gt;paper&lt;/a&gt; by Dwork, McSherry, Nissim,
and Smith.&lt;/p&gt;
&lt;p&gt;The terminology is a bit confusing here because people often talk
about &amp;quot;implementing&amp;quot; or &amp;quot;using&amp;quot; differential privacy to mean that
they are adding randomness in order to provide $&#92;epsilon$-differential
privacy. Moreover, there are two kinds of differential privacy
depending on where the noise is added:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Local differential privacy (LDP)&lt;/em&gt;, where the noise is added at the
endpoints, so that even the data collector doesn&#39;t learn much
about the user&#39;s information. Randomized response techniques such
as we have been discussing throughout this post provide LDP.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Central differential privacy (CDP)&lt;/em&gt; where the collector gets
accurate information but then adds randomness before disclosing
it to people, as discussed in the previous section.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One important real-world example of central differential privacy
is the US census, which is &lt;a href=&quot;https://www2.census.gov/library/publications/decennial/2020/2020-census-disclosure-avoidance-handbook.pdf&quot;&gt;adding randomness&lt;/a&gt; publishing in order to provide differential privacy.
Unsurprisingly, this has resulted in complaints that the accuracy
of the data will be &lt;a href=&quot;https://apnews.com/article/business-census-2020-technology-e701e313e841674be6396321343b7e49&quot;&gt;degraded in important ways&lt;/a&gt;. I haven&#39;t learned enough about this to have an informed
opinion on whether DP will an important impact on census
results, but it will obviously have &lt;em&gt;some&lt;/em&gt; impact
and in general, any use of DP has some sort of
tradeoff between accuracy and privacy.&lt;/p&gt;
&lt;p&gt;Importantly, it&#39;s not enough to say that a system is
differentially private: you need to specify
the $&#92;epsilon$ value, which embodies that privacy/accuracy
tradeoff by dicatating how much randomness to add.
Unfortunately, especially for LDP systems,
it&#39;s hard to find a set of parameters which allow for
good data collection and also have good privacy properties. For instance, Apple
implemented an &lt;a href=&quot;https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf&quot;&gt;LDP system&lt;/a&gt;
for collecting user telemetry but concerns have been &lt;a href=&quot;http://theory.stanford.edu/~korolova/Privacy_Loss_in_Apple%27s_Implementation_of_Differential_Privacy.pdf&quot;&gt;raised&lt;/a&gt; about the level of actual leakage
in practice. The authors of RAPPOR report similar problems, where
their choose of $&#92;epsilon$ lead to relatively low measurement
power, and it&#39;s not really clear that it&#39;s possible to build
a general purpose LDP system that has both good privacy
and acceptable accuracy.
CDP systems also have this problem to some extent but less
so because you only need to add enough &lt;em&gt;total&lt;/em&gt; randomness to
protect users, rather than enough randomness to each submission.
With that said, selecting the right $&#92;epsilon$ value is
still an &lt;a href=&quot;https://www.petsymposium.org/2019/files/papers/issue1/popets-2019-0011.pdf&quot;&gt;open problem&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The bottom line here is that while randomization is an important technique,
local randomization is pretty hard to use except for a fairly narrow
category of questions because the level of randomization required in
order to provide privacy has such a large negative impact on
accuracy. By contrast, central/global randomization techniques
seem much more promising as a way to safely query data which has
been gathered with exact techniques.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is a general statistical phenomenon: if you have some
response variable that is a function of some independent
variables plus some random effects, then the more measurements
you take the more the random effects will tend to wash out,
leaving the predictable effects. This is why it is helpful
to have a large data set. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As a concrete example, Mozilla is working on a technique for
certificate revocation for Firefox called &lt;a href=&quot;https://blog.mozilla.org/security/2020/01/21/crlite-part-3-speeding-up-secure-browsing/&quot;&gt;CRLite&lt;/a&gt; in which you use multiple &amp;quot;cascading&amp;quot; Bloom
filters with the second Bloom filter allowlisting certificates
which are spuriously blocked in the first filter, the third blocklisting
those spuriously allowed in the second, etc. Google Safe Browsing
is effectively a Bloom filter with only one hash function. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-randomness/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Modelling grade&#39;s impact on running pace</title>
		<link href="https://educatedguesswork.org/posts/grade-vs-pace/"/>
		<updated>2021-11-01T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/grade-vs-pace/</id>
		<content type="html">&lt;script type=&quot;text/javascript&quot; id=&quot;MathJax-script&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js&quot;&gt;
&lt;/script&gt;
&lt;p&gt;I’ve been doing some more thinking about my pacing at &lt;a href=&quot;https://educatedguesswork.org/posts/sob100k/&quot;&gt;Sean
O’Brien 100K&lt;/a&gt;. As I said, my general sense is that I’m comparatively
slower on the downhill than the uphill.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This is based on two main pieces of evidence:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Having people pass me on the way down but catching them on the way
up.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Comparing &lt;a href=&quot;https://ultrapacer.com/&quot;&gt;Ultrapacer&lt;/a&gt;’s predictions to my
actual splits, I generally seem to get ahead on the climbs and fall
behind on the descents.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It’s one thing to have a general impression, though, and another to
actually have data. Hence, this post. I
want to note upfront that there’s some prior art here, and I’ll be
talking about it later in this post. However, I’m coming this from a
slightly different angle, and I think it’s useful to see how we get to a
solution.&lt;/p&gt;
&lt;h2 id=&quot;modelling-activities&quot;&gt;Modelling Activities &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#modelling-activities&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Let’s start by looking at a single activity. We can start with my data
from SOB 100K. Conveniently, my &lt;a href=&quot;https://www.garmin.com/en-US/p/641435&quot;&gt;Garmin Fenix
6X&lt;/a&gt; spits out a recording that
has readings every 1s, so we can use that data. For convenience, I
pulled the data down from &lt;a href=&quot;https://runalyze.com/dashboard&quot;&gt;Runalyze&lt;/a&gt;
which I use for tracking my workouts.&lt;/p&gt;
&lt;h3 id=&quot;data-extraction&quot;&gt;Data Extraction &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#data-extraction&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Garmin (and Runalyze) supports both conventional
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=GPS_Exchange_Format&amp;amp;oldid=1049279073&quot;&gt;GPX&lt;/a&gt;
and
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Training_Center_XML&amp;amp;oldid=965873825&quot;&gt;TCX&lt;/a&gt;
files, but we’ll be using the TCX. The GPX file just has points with
lat/long, but the TCX file also contains elevation and distance
traversed, like so:&lt;/p&gt;
&lt;pre class=&quot;language-xml&quot;&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;Trackpoint&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;Time&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;2021-10-23T04:59:55+00:00&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;Time&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;Position&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;LatitudeDegrees&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;34.09598&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;LatitudeDegrees&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;LongitudeDegrees&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;-118.71654&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;LongitudeDegrees&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;Position&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;AltitudeMeters&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;167&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;AltitudeMeters&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;Cadence&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;0&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;Cadence&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;DistanceMeters&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;0.02&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;DistanceMeters&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;Extensions&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;ns3:&lt;/span&gt;TPX&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;ns3:&lt;/span&gt;Speed&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;0.01&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;ns3:&lt;/span&gt;Speed&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;ns3:&lt;/span&gt;Watts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;237&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;ns3:&lt;/span&gt;Watts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span class=&quot;token namespace&quot;&gt;ns3:&lt;/span&gt;TPX&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;Extensions&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;Trackpoint&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We could of course use GPX, but then I’d need to compute distance
traversed and there’s no particular reason to think I’d do better than
Garmin.&lt;/p&gt;
&lt;p&gt;The only thing we need here is &lt;code&gt;AltitudeMeters&lt;/code&gt; and &lt;code&gt;DistanceMeters&lt;/code&gt;,
though one could imagine making some use of &lt;code&gt;Speed&lt;/code&gt; and &lt;code&gt;Watts&lt;/code&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; I’m only moving at about 2 m/s and even
with the barometric altimeter Garmin elevation isn’t that accurate, so
we don’t really want to use second by second readings. Instead, what I
did is break up the course into segments of approximately 100m
(technically, slightly over, because I accumulated data for a single
segement until the total distance was &amp;gt;=100m) and then saved the
segment. This is pretty easy to do in Python, and the output is a table
of segments like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Total   Lap     Distance        Up      Down
33      33      100.900000      0       -1
67      34      101.890000      2       -1
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A note on programming languages here: I’m using
&lt;a href=&quot;https://www.r-project.org/&quot;&gt;R&lt;/a&gt; for the statistics, but I’m more
comfortable parsing XML with Python, so I decided to use Python for the
bare minimum of pulling the raw data out of the TCX file, but R for
further manipulation. Not only is R better for this kind of thing, but
it also has the benefit of giving us a more reproducible analysis as
well as &lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#source-code&quot;&gt;showing our work&lt;/a&gt; so you can see what I actually did. Plus, it’s
a good demo of the power of R and
&lt;a href=&quot;https://ggplot2.tidyverse.org/&quot;&gt;ggplot&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We don’t really want distance and up/down but rather pace and grade.
That’s easy to compute given this raw data with a few lines of R:&lt;/p&gt;
&lt;pre class=&quot;language-r&quot;&gt;&lt;code class=&quot;language-r&quot;&gt;load.data &lt;span class=&quot;token operator&quot;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;f&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; name&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token comment&quot;&gt;# Read the data in&lt;/span&gt;&lt;br /&gt;   df &lt;span class=&quot;token operator&quot;&gt;&amp;lt;-&lt;/span&gt; fread&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;f&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;   &lt;span class=&quot;token comment&quot;&gt;# Compute values&lt;/span&gt;&lt;br /&gt;   df &lt;span class=&quot;token operator&quot;&gt;&amp;lt;-&lt;/span&gt; mutate&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;df&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Vert&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;Up&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;Down&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Grade&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Vert&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;Distance&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Pace&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;Distance&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;Lap&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;         Course&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Hour&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;ceiling&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Total&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3600&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token comment&quot;&gt;# Remove outliers&lt;/span&gt;&lt;br /&gt;   df &lt;span class=&quot;token operator&quot;&gt;&amp;lt;-&lt;/span&gt; df&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;Up&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;400&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;   df &lt;span class=&quot;token operator&quot;&gt;&amp;lt;-&lt;/span&gt; df&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;abs&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Grade&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;25&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that we need the last clause because there are some outlier data
points which otherwise look terrible on our graph and cram the stuff
we’re interested into a small portion of the surface area.&lt;/p&gt;
&lt;h3 id=&quot;first-look&quot;&gt;First Look &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#first-look&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Let’s start by just doing a simple scatter plot of Pace to Grade.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grade-vs-pace_files/figure-markdown_strict/unnamed-chunk-3-1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I’ve added two extra pieces of decoration here. First, the blue line is
a
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Local_regression&amp;amp;oldid=1047545568&quot;&gt;loess&lt;/a&gt;
smoother applied to the points. This is just ggplot’s default smoother
and gives us kind of an eyeball fit that helps us see the pattern that’s
obvious from the points anyway: generally, climbing is slower and
descending is faster, but once the hill gets really steep (above 10%)
then descending starts to get slower again. The reason for this is that
gravity wants to take you down faster than you (or at least I) can
(safely) run, so you’re actually trying to slow down. This is a common
pattern, though of course some people are better descenders than
others.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;I’ve also colored the segments by how far into the race I was (which
hour). As you can see, I’m slowing down slightly as I get further into
the race, especially on the uphills. This is probably due to my decision
to run the climbs at the beginning and hike later. There’s no obvious
equivalent pattern at grades &amp;lt;0%, which suggests that I’m not slowing
down much when I choose to run, a sign of good, even, pacing. There are
a number of real outliers here with very slow pace. This is probably due
to three things: (1) time spent in aid stations which I was too lazy to remove
(2) times when I had to go through so some really technical section (3)
GPS error.&lt;/p&gt;
&lt;p&gt;This isn’t a surprising pattern, and you can see the same thing in a
recent workout, though the pace is a little more even throughout the
workout. This is probably due to the absence of aid stations as well
as to running the whole thing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grade-vs-pace_files/figure-markdown_strict/unnamed-chunk-4-1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h3 id=&quot;modelling-the-data&quot;&gt;Modelling the Data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#modelling-the-data&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It’s useful to know about this pattern, but what we’d really like is
some consistent formula that can be used to predict race paces. In
particular, what we want is to have a model that predicts paces at
different grades. Here’s my first attempt, fitting a quadratic equation
to the SOB data (the black line is the quadratic).&lt;/p&gt;
&lt;pre class=&quot;language-r&quot;&gt;&lt;code class=&quot;language-r&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## Call:&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## lm(formula = Pace ~ poly(Grade, 2), data = df.sob)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## Residuals:&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;##      Min       1Q   Median       3Q      Max &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## -2.34019 -0.20411  0.04757  0.24871  0.79749 &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## Coefficients:&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;##                  Estimate Std. Error t value Pr(&gt;|t|)    &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## (Intercept)       2.35005    0.01171  200.71   &amp;lt;2e-16 ***&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## poly(Grade, 2)1 -14.00792    0.36486  -38.39   &amp;lt;2e-16 ***&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## poly(Grade, 2)2  -4.89296    0.36486  -13.41   &amp;lt;2e-16 ***&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## ---&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## Residual standard error: 0.3649 on 968 degrees of freedom&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## Multiple R-squared:  0.6308, Adjusted R-squared:   0.63 &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token comment&quot;&gt;## F-statistic: 826.9 on 2 and 968 DF,  p-value: &amp;lt; 2.2e-16&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grade-vs-pace_files/figure-markdown_strict/unnamed-chunk-7-1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;There’s no principled reason to fit a quadratic here; it’s not like I
have a good physical model for running performance by grade (as we’ll
see, nobody else seems to, either). A quadratic is just approximately
the right shape and has a small number of covariates so we don’t need to
worry about overfitting. It’s not terrible but just eyeballing things,
it’s not doing a good job of capturing the rapid decline at grades
steeper than -10%. A third degree polynomial does a little better, as
well as doing a better job of matching the loess smoother’s maximum
pace. Here’s a graph with all three fits.&lt;/p&gt;
&lt;pre class=&quot;language-r&quot;&gt;&lt;code class=&quot;language-r&quot;&gt;    &lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Call:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## lm(formula = Pace ~ poly(Grade, 3), data = df.sob)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Residuals:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;##     Min      1Q  Median      3Q     Max &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## -2.4075 -0.1870  0.0285  0.2399  0.7899 &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Coefficients:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;##                  Estimate Std. Error t value Pr(&gt;|t|)    &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## (Intercept)       2.35005    0.01139 206.246  &amp;lt; 2e-16 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## poly(Grade, 3)1 -14.00792    0.35506 -39.452  &amp;lt; 2e-16 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## poly(Grade, 3)2  -4.89296    0.35506 -13.781  &amp;lt; 2e-16 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## poly(Grade, 3)3   2.63739    0.35506   7.428 2.42e-13 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## ---&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Residual standard error: 0.3551 on 967 degrees of freedom&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Multiple R-squared:  0.6507, Adjusted R-squared:  0.6496 &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## F-statistic: 600.5 on 3 and 967 DF,  p-value: &amp;lt; 2.2e-16&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grade-vs-pace_files/figure-markdown_strict/unnamed-chunk-9-1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Going to a fourth degree polynomial doesn’t improve the situation: you
get about the same R-squared and the fourth degree term isn’t
significant. So, this is about as well as we’re going to do with
polynomial fits.&lt;/p&gt;
&lt;p&gt;My initial reaction here was to be sad because a third-degree polynomial
is clearly aphysical: we know that pace is slower at very steep uphill
and downhill grades, and any odd-degree polynomial has to point in
opposite directions at positive and negative infinity (you can see the
start of this in the flattening of the third-degree curve around +20%).
However, if you take a step back, &lt;em&gt;any&lt;/em&gt; polynomial fit is clearly
aphysical because grades with absolute values over 100% don’t make any
sense: they’re just steep in the other direction. Moreover, once you get
close to 100% in either direction, you’re not really talking about
running any more, but rock climbing, and the dominant factor starts to
be the quality of the surface, not the grade. As a practical matter
then, we’re looking at a function that’s only defined in a relatively
narrow domain of grades. Finally, what we’re trying to do is really just
summarize the data for the purpose of comparison and prediction, and for
that it doesn’t matter that much whether we have a good physical model,
so long as it does a good job of matching the data and has a small
number of coefficients to minimize the risk of overfitting.&lt;/p&gt;
&lt;h3 id=&quot;multiple-activitys&quot;&gt;Multiple Activitys &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#multiple-activitys&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Modelling multiple activities is actually slightly complicated. The
basic problem here is that each course is different. For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;More technical (rocky, rooty, …) courses are slower than more smooth
courses and trail is slower than road.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Longer courses are inherently slower, so you can’t move as fast.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means that just attempting to jointly fit multiple workouts by
putting them all in the same fit won’t work properly. My current
approach to this is to not try to individually account for these
factors but just to have a per-course adjustment.  I.e., we fit the
equation:&lt;/p&gt;
&lt;p&gt;$$Pace = &#92;beta_1 * g^2 + &#92;beta_2 * g + &#92;beta_3(Course) + &#92;beta_4$$&lt;/p&gt;
&lt;p&gt;People with a statistics background may be noticing that this is an
additive correction for the course rather than a multiplicative
correction. I’m honestly not sure which would be better, but this is
easier to set up so I’m using it&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; Here’s the result with the two courses
we’ve seen already plus another long run of 20 miles or so. This gives
us about the result we’d expect: the two workouts, Priest Rock and
Rancho are about the same length and so the curves nearly overlap, with
no significant difference in the coefficient for Rancho; the only real
difference in pacing is that Priest was somewhat hillier than Rancho. By
contrast, because SOB is a much longer event, it’s notably slower even
at the same grades. This result should give us some confidence that this
modelling strategy isn’t too terribly wrong.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre class=&quot;language-r&quot;&gt;&lt;code class=&quot;language-r&quot;&gt;    &lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Call:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## lm(formula = Pace ~ poly(Grade, 3) + Course, data = df.all)&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Residuals:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;##      Min       1Q   Median       3Q      Max &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## -2.42649 -0.17331  0.03976  0.22435  1.27079 &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Coefficients:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;##                  Estimate Std. Error t value Pr(&gt;|t|)    &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## (Intercept)       2.59349    0.01980 130.992   &amp;lt;2e-16 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## poly(Grade, 3)1 -17.23793    0.34993 -49.262   &amp;lt;2e-16 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## poly(Grade, 3)2  -8.95348    0.35214 -25.426   &amp;lt;2e-16 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## poly(Grade, 3)3   3.80741    0.35009  10.875   &amp;lt;2e-16 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## CourseRancho     -0.01784    0.02773  -0.643     0.52    &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## CourseSOB        -0.24748    0.02277 -10.868   &amp;lt;2e-16 ***&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## ---&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Residual standard error: 0.3499 on 1611 degrees of freedom&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## Multiple R-squared:  0.676,  Adjusted R-squared:  0.675 &lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token comment&quot;&gt;## F-statistic: 672.2 on 5 and 1611 DF,  p-value: &amp;lt; 2.2e-16&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grade-vs-pace_files/figure-markdown_strict/unnamed-chunk-11-1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;We actually don’t care about the coefficient for the courses, because
that will be different for each course. Instead, what we’re interested
in is the adjustment for grade; the purpose of the course coefficient is
just to wash out the differences between courses, leaving us with the
grade factor. We can get approximately there by rescaling the data
against the pace at level grade. I.e.,&lt;/p&gt;
&lt;p&gt;$$ PaceRatio(g) = Pace(g) / Pace(0) $$&lt;/p&gt;
&lt;p&gt;This gives us the correction factor we need to predict pace at any grade
for any course. Here’s the same graph with Pace Ratio on the y axis
rather than Pace. As you can see, this looks pretty good, with both the
data points and the fits nicely overlaid. You’ll also note that fits
aren’t precisely identical. This is because the correction factor for
course is additive rather than multiplicative, and so when mapped onto a
ratio you don’t get identical ratios for each curve. However, it’s quite
close, and given the inherent uncertainty in this data, it’s probably
close enough.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grade-vs-pace_files/figure-markdown_strict/unnamed-chunk-12-1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;other-work&quot;&gt;Other Work &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#other-work&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I noted at the beginning, there has been other work in this area,
though perhaps not as much as you’d think. Many endurance training sites
such as Strava, Garmin, and Runalyze have what’s called &lt;a href=&quot;https://support.strava.com/hc/en-us/articles/216917067-Grade-Adjusted-Pace-GAP-&quot;&gt;Grade Adjusted
Pace&lt;/a&gt;,
which attempts to map actual pace onto the notional pace that the same
effort would have produced on level ground. I don’t know what Garmin’s
algorithm is, but the Runalyze algorithm and the original Strava algorithm seem to
trace back to a
&lt;a href=&quot;https://journals.physiology.org/doi/full/10.1152/japplphysiol.01177.2001&quot;&gt;paper&lt;/a&gt;
by Minetti et al. called “Energy cost of walking and running at extreme
uphill and downhill slopes”.&lt;/p&gt;
&lt;p&gt;Minetti et al. gathered their data by putting subjects on a treadmill at
various grades and measuring oxygen consumption to estimate energy
consumption.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt; In 2017, Strava updated their algorithm based on their
extensive data of user workouts using heart rate instead of measuring oxygen
consumption as a measure of effort (see this
&lt;a href=&quot;https://medium.com/strava-engineering/an-improved-gap-model-8b07ae8886c3&quot;&gt;post&lt;/a&gt;
by Drew Robb). The main reason they give is that a constant-effort
mapproach overestimate&#39;s downhill speed, most likely because people
are unwilling or unable to run downhill at the paces that would give them
constant effort. Here’s their figure comparing their Minetti-based
algorithm with the new HR-based algorithm:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/2000/1*_TwofsNS872wbUS12ykKPQ.png&quot; alt=&quot;StravaGAP&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Like our model, Strava’s predicts maximum pace at about -10% grade, as
opposed to the Minetti model which is at about -20%. This is consistent
with Minetti’s general overestimation of pace at steeper descents.
It&#39;s actually not quite clear to me why a constant heart rate model
works better here, as HR is a common proxy for effort. My best
guess is that people&#39;s HR goes up due to the need to navigate
steep downhills even if their energy consumption isn&#39;t as high.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ultrapacer.com/&quot;&gt;Ultrapacer&lt;/a&gt; is a race pacing tool which
attempts to project segment times given a desired finish time. It
accounts for terrain using a &lt;a href=&quot;https://github.com/amokrunner/ultrapacer/blob/master/core/normFactor.js#L12&quot;&gt;quadratic
model&lt;/a&gt;
between -22% and 16% grades (and linear outside them). I don&#39;t know
the source of this model.&lt;/p&gt;
&lt;p&gt;$$Factor = .0021*g^2 + .034g + 1$$&lt;/p&gt;
&lt;p&gt;Below I’ve plotted all of these models against each other. I had to
hand-transcribe the Strava and Minetti values off Robb’s diagram with a
ruler so they’re a bit approximate, but the smoother helps clean that up
a bit.) Because GAP is using a correction factor to map from actual pace
to level pace rather than the other way around, I have to take the
reciprocal of PaceRatio to line my data up.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/grade-vs-pace_files/figure-markdown_strict/unnamed-chunk-13-1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Except for Minetti, which we all agree is wrong, these don’t line up
too badly. With that said, my data is noticeably slower on the downhills
and noticeably faster on the uphills than any of the other models (i.e.,
it’s just generally flatter). This is consistent with the my observation
at the beginning of this post that Ultrapacer seemed to overestimate how
fast I would be on the descents and underestimate how fast I would be on
the climbs.&lt;/p&gt;
&lt;h2 id=&quot;source-code&quot;&gt;Source Code &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#source-code&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Although I&#39;ve used &lt;a href=&quot;https://rmarkdown.rstudio.com/&quot;&gt;Rmarkdown&lt;/a&gt; to
generate this post (minus some pre-post editing of the text&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
I&#39;ve set it to omit most of the R source code to avoid cluttering
everything up. You can find a copy of the code &lt;a href=&quot;https://github.com/ekr/runfit&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt; Obviously, if you’re going
to be comparatively faster on one section, you need to be comparatively
slower on another section in order to match the same overall pace. &lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt; This
is just Garmin’s estimate of how much power I would need to run at this
speed; you can get &lt;a href=&quot;https://www.stryd.com/en/&quot;&gt;running power meters&lt;/a&gt; but
I don’t have one, and it’s kind of
&lt;a href=&quot;https://www.dcrainmaker.com/2019/06/testing-in-the-wind-tunnel-with-stryds-new-running-power-meter.html&quot;&gt;unclear&lt;/a&gt;
how accurate they are anyway. &lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;As an aside, bicycles can descend much faster than runners.
It’s not uncommon for me to pass mountain bikes going up some climb only
to have them tear down me on the descent. &lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The eventual correction is about .25 m/s between a 20 mile
workout and a 100K, as compared to an overall pace of about
2.5m/s, so it&#39;s not clear that additive versus multiplicative
will matter that much. &lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that the form of the fit requires all the curves to be
the same shape, just vertically displaced, so that&#39;s not something
to get too excited about. &lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt; They fit this data to a 5th order polynomial, which
seems like a recipe for overfitting, but we can just look at the
empirical data. &lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I can&#39;t plug the Rmd file or its output right into &lt;a href=&quot;https://www.11ty.dev/&quot;&gt;Eleventy&lt;/a&gt;
and I&#39;m too lazy to backport the text changes. &lt;a href=&quot;https://educatedguesswork.org/posts/grade-vs-pace/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>The EU vaccine passport compromise and how to (maybe) fix it</title>
		<link href="https://educatedguesswork.org/posts/eu-vaccine-passport-leak/"/>
		<updated>2021-10-29T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/eu-vaccine-passport-leak/</id>
		<content type="html">&lt;p&gt;Bleeping Computer
&lt;a href=&quot;https://www.bleepingcomputer.com/news/security/eu-investigating-leak-of-private-key-used-to-forge-covid-passes/&quot;&gt;reports&lt;/a&gt;
that there has been some compromise of the EU COVID-19 vaccination
certificate system. As I &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu&quot;&gt;wrote&lt;/a&gt;, the EU
system depends on digital signatures, with each jurisdiction having
its of set of private keys.&lt;/p&gt;
&lt;h2 id=&quot;what-happened%3F&quot;&gt;What Happened? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#what-happened%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s currently a bit unclear what has happened here, but the situation appears
to be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;There are multiple bogus appearing certificates floating around
for names such as Adolf Hitler, Spongebob Squarepants, and the
always popular Joe Mama.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;These certificates are signed with several different private keys
(mostly Macedonia, but also France and Poland).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;These certificates also have country indications (i.e., they
claim to be from countries) that are different than the
jurisdiction associated with the signing key.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We are seeing online offers to generate bogus passports
for people for 300 euros.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;[The above is largely based on this &lt;a href=&quot;https://github.com/ehn-dcc-development/hcert-spec/issues/103#issuecomment-953382640&quot;&gt;analysis&lt;/a&gt;
by &lt;a href=&quot;https://github.com/denysvitali&quot;&gt;denysvitali&lt;/a&gt;.]&lt;/p&gt;
&lt;p&gt;It&#39;s clear from this that something is wrong, but the question is
what? The first major possibility is one or more private keys has
actually leaked. This is consistent with what we&#39;re seeing,
but so far nobody has published it. The most I&#39;ve seen is
this screenshot (from Bleeping Computer) that alleges to be a partial key.
However, I do not believe that this partial a key is sufficient to
verify the key is valid.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://www.bleepstatic.com/images/news/u/1164866/2021/Oct-2021/eu-covid-pass-private-key-leak/forum-covid-eu-passs.jpg&quot; alt=&quot;Private Key Screenshot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The other possibility is that someone has compromised one of
the systems used to issue certificates rather than the key
itself. This would also allow them to issue new certificates with
fake information but would be far more recoverable, for reasons
I&#39;ll discuss below. At present, we don&#39;t have enough information
to distinguish these cases, though as denysvatili points
out, if we saw a validly signed credential that was clearly
semantically invalid (e.g., had a date far in the past)
then that would be suggestive of key compromise because
the signing systems probably have some mechanisms to
prevent (mostly accidental) signing of such credentials.&lt;/p&gt;
&lt;h2 id=&quot;recovering-from-compromise&quot;&gt;Recovering from Compromise &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#recovering-from-compromise&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Whatever the cause, it seems likely that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The attacker(s) have the ongoing capability to generate
new bogus certificates. This capability needs to be
disabled.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There are existing certificates for non-obviously bogus
names. These certificates should be invalidated.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In general, this kind of PKI system isn&#39;t designed to smoothly
recover from this kind of system compromise. People typically
just assume that you&#39;ll revoke the signing key and reissue
certificates. This will obviously work
but is a large burden on existing users; for people
who are using one of the official apps, the EU can just
issue an update that makes it automatically retrieve
a new certificate, but this won&#39;t work for people who
have printed the certificate or stored it in something
like Apple Wallet. Even for app users, you have to worry
about people who are offline for a while or about bugs
which cause automatic issuance to fail. For that reason,
it&#39;s worth asking if we can do better, though exactly
what we can do depends a lot on the system design and the nature of the compromise.&lt;/p&gt;
&lt;h3 id=&quot;system-compromise&quot;&gt;System Compromise &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#system-compromise&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If the signing system has been compromised but the key has not
been, then it&#39;s possible to remove the attacker&#39;s ability to
generate new certificates by fixing the compromise. In the
short term, whatever system actually does the issuance can
be taken completely offline.
This will prevent issuance
of both bogus and valid certificates, and so it&#39;s also
necessary to close whatever avenues were used to compromise
the system (and whatever new avenues the attackers have
created). It may simply be easier to deploy a new
uncompromised system.&lt;/p&gt;
&lt;p&gt;This leaves us with the problem of invalidating the bogus
certificates that already exist. As noted above,
one traditional approach
here would be to just revoke the signing key and force
everyone to get new certificates, but that&#39;s not ideal.&lt;/p&gt;
&lt;p&gt;If you can identify the invalid issued certificates, then it is better
to somehow individually revoke them, thus avoiding disturbing valid
users. Whether this is possible depends on what kinds of records you
have kept. In an ideal world, a system like this would keep a copy of
every certificate it issued (possibly publishing them to something
like a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Certificate_Transparency&amp;amp;oldid=1049898445&quot;&gt;Certificate
Transparency&lt;/a&gt;
log). You should be able to combine this with the records you used
for the original issuance (you have those, right?) to identify which
certificates were actually valid. This is the best case scenario
and then you just publish a list of the invalid certs (or their
hashes) to the app, which can reject them.&lt;/p&gt;
&lt;p&gt;One complication here is that the EU system does not appear
to contain &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu#revocation&quot;&gt;a revocation mechanism&lt;/a&gt;
for individual credentials, so you&#39;ll probably need an app
update to ship that. As long as everyone uses the EU&#39;s
verification app, this is probably not a big deal, but if
they don&#39;t, then things get complicated fast.&lt;/p&gt;
&lt;p&gt;It&#39;s also possible you only have partial records (e.g., just
a list of the valid certificates). Your options here depend
on the information you have an the structure of the certificates.
For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you have a list of valid certificates and all certificates
have sequential sequence numbers then you can discover
the invalid ones by elimination and revoke those.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you just have a list of valid certificates, you can
have apps check aganst that list (see below for more
details on this).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you just know when the period of compromise occurred,
you can have the app reject certificates in that date
range; this will inconvenience some valid users but
not most.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that the latter two options require changing the
verification app, but as noted above, in this case it looks
like any revocation would require changing the app.&lt;/p&gt;
&lt;p&gt;Depending on how many bogus certificates were issued, this may
all be more trouble than it&#39;s worth. A system like this can
survive a modest amount of fraud -- especially because
vaccination doesn&#39;t confer perfect immunity anyway --  so if it&#39;s just a few
certificates, it may be easier to just ignore the
problem, especially if the alternative is inconveniencing a lot of legitimate
users. On the other hand, if compromised certificates are
widespread you probably need to do something.&lt;/p&gt;
&lt;p&gt;Note that in this case, you may actually not even need to revoke and
reissue the signing key, as long as you&#39;re sure it wasn&#39;t
compromised. On the other hand, it might be logistically
easier if, for instance, you need to set up a parallel system.&lt;/p&gt;
&lt;h3 id=&quot;key-compromise&quot;&gt;Key Compromise &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#key-compromise&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The situation with key compromise is quite a bit worse because the
attacker can make as many certificates as they want with any contents
they want. This makes it impossible to revoke all the invalid
certificates. The only real option here to contain the compromise is
to revoke and reissue the signing key.&lt;/p&gt;
&lt;p&gt;Unfortunately, this invalidates all existing certificates.  Naively,
you would need to reissue all of them with the new signing key, but if
you kept copies of all the valid certificates, then it might be
possible to do better. One obvious approach would be to stand up a service (a la
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Online_Certificate_Status_Protocol&amp;amp;oldid=1045640694&quot;&gt;OCSP&lt;/a&gt;) which tells whether a given certificate
is valid or not. The obvious problem here is that this
creates a tracking vector: the server now knows where
each user is because of which apps ask about them.&lt;/p&gt;
&lt;p&gt;An alternative approach is just to publish hashes of all of the valid certificates.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
This is practical if the list is small, but Poland has a population
of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Poland&amp;amp;oldid=1052326339&quot;&gt;nearly 40 million&lt;/a&gt;. If we use 16 byte hashes&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;,
then this is hundreds of megabytes which have to be sent to the application.
Naively you might think you could trim this down to just certificates
issued inside the window of compromise, but remember that this
attacker can issue keys with any date. The same problem applies
to listing valid serial numbers. Thus, given the size of this
database, this probably isn&#39;t workable either.&lt;/p&gt;
&lt;p&gt;While there&#39;s no perfect solution, there are a number of ways
to improve these basic designs. One is to use the &amp;quot;hash prefix&amp;quot;
approach used by &lt;a href=&quot;https://safebrowsing.google.com/&quot;&gt;Safe Browsing&lt;/a&gt;.
When the app sees a certificate &lt;em&gt;C&lt;/em&gt; it computes the hash &lt;em&gt;H(C)&lt;/em&gt;
and sends the first 10 or so bits of &lt;em&gt;H(C)&lt;/em&gt; to the server:
the server then sends all hashes with that hash prefix and
the app can then check the hash against that list. This
is a privacy/bandwidth tradeoff: it improves privacy because the server only knows that one of a
thousand or so people presented their credential, though
it&#39;s still possible to make some inferences about behavior.
It improves bandwidth because the app only needs
to download a fraction of the database for each user
(and only once). Of course, if the app has to verify
a lot of users it will quickly end up downloading the
whole database anyway.&lt;/p&gt;
&lt;p&gt;Another potential design is to just proxy the requests:
this would tell the server every time a user presented
their credential so it would know how often you
were validated, but in theory not where. This is really
placing a lot of trust in the proxy though, and you
would also need to be very sure that the app itself
wasn&#39;t leaking its identity on repeated queries (see
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/&quot;&gt;here&lt;/a&gt; for more on using
proxies safely).&lt;/p&gt;
&lt;p&gt;An alternative design is to use a real
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Private_information_retrieval&amp;amp;oldid=1051795014&quot;&gt;Private Information Retrieval (PIR)&lt;/a&gt; system.
These allow people to retrieve informatio from servers without
the server learning what information is being retrieved.
&lt;a href=&quot;https://eprint.iacr.org/2021/345&quot;&gt;Checklist&lt;/a&gt; by Kogan and
Corrigan-Gibbs is a PIR system designed for Safe Browsing
type applications which might be possible to repurpose for
this kind of application.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s arguable that I&#39;m overthinking this. After all, we could
just reissue everyone&#39;s certificates. But that&#39;s obviously
very disruptive and so it&#39;s worth thinking about how we could
do better. Also, it&#39;s not that uncommon to run into situations
where something goes really wrong and the recovery mechanisms
built into your system aren&#39;t really adequate and you have
to &lt;a href=&quot;https://hacks.mozilla.org/2019/05/technical-details-on-the-recent-firefox-add-on-outage/&quot;&gt;get clever&lt;/a&gt; to fix things, so it&#39;s worth getting some
practice in that. Moreover, as you can see from the above, there&#39;s a bunch of
overlap with other problems, so a solution to one might give
you some useful traction on others. Even better, of course,
would be to have a system which didn&#39;t get compromised in this
way.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Whether it&#39;s possible to verify/reconstruct
a private key from partial information depends on the algorithm
and how much/which information you have. This key is in PKCS#8 format and based on the leading
bytes appears to be RSA. While it &lt;em&gt;is&lt;/em&gt; possible to
&lt;a href=&quot;https://hovav.net/ucsd/papers/hs09.html&quot;&gt;reconstruct RSA keys from partial information&lt;/a&gt;, RSA keys in PKCS#8 format are represented using
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8017#appendix-A.1.2&quot;&gt;RSAPrivateKey&lt;/a&gt;,
which has the public key (specifically, the modulus),
first, and the modulus should take up more than the
2+ lines shown here. I&#39;m sure a real cryptographer
will correct me if I have something wrong here. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This assumes that the signing system
has not been compromised to the extent that one can
generate invalid dates. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that even though the names, etc. in the certificates
can be dictionary searched, because the certificate
signatures are high entropy, a dictionary search of the
hashes is not practical. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
We might be able to get away with smaller hashes, but because
the attacker has the list, the hash has to be preimage
resistant, so it probably needs to be at least 80 bits. &lt;a href=&quot;https://educatedguesswork.org/posts/eu-vaccine-passport-leak/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Sean O&#39;Brien 100K Race Report</title>
		<link href="https://educatedguesswork.org/posts/sob100k/"/>
		<updated>2021-10-25T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/sob100k/</id>
		<content type="html">&lt;p&gt;Last weekend I ran the &lt;a href=&quot;https://www.khraces.com/series/sean-o-brien-50-50&quot;&gt;Sean O&#39;Brien (SOB) 100K&lt;/a&gt;
in Southern California. This was a somewhat last minute backup race after Pine
to Palm 100 miles was cancelled. There weren&#39;t too many 50M/100Ks in
October&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/sob100k/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and my coach &lt;a href=&quot;https://sundogrunning.com/coaches-ian-torrence-emily-harrison-eric-senseman-ron-hammett-will-baldwin/&quot;&gt;Emily Torrence&lt;/a&gt; won SOB back in 2017, so I was able
to take advantage of her expert knowledge.&lt;/p&gt;
&lt;p&gt;Overall this went well. I came in at 12:53, beating my 100K PR from
from &lt;a href=&quot;https://insidetrail.com/calendar/ordnance-100k/&quot;&gt;Ordnance 100K 2017&lt;/a&gt;
-- a much easier race -- by almost 9 minutes
and my &lt;a href=&quot;https://www.tahoe200.com/tahoe-100k/&quot;&gt;Tahoe 100K 2018&lt;/a&gt; -- probably a more comparable event -- time by over
2.5 hrs. Generally, I stuck to my pre-race pace and fueling plan and
hit my pre-race target of 12-13 hours. The basic plan was to run the
first half at &amp;quot;long run&amp;quot; effort and then try to hold on the second
half. I didn&#39;t quite succeed in this,
but was reasonably close.&lt;/p&gt;
&lt;p&gt;To orient yourself, here is the course and the hill profile. The
circles on the course are mile markers, so you start at the far
right, go all the way to the left, around the loop counter-clockwise,
then backtrack. There&#39;s that out-and-back down to Bulldog
and then you backtrack to the finish. The circles on the profile
are &amp;quot;climb score&amp;quot;, Runalyze&#39;s estimate of how hard the climb was.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/sob-map.png&quot; alt=&quot;Map&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/sob100k-profile.png&quot; alt=&quot;Profile&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Screenshots from &lt;a href=&quot;https://runalyze.com/&quot;&gt;Runalyze&lt;/a&gt;]&lt;/p&gt;
&lt;h2 id=&quot;start-to-corral-canyon-%5B7.3-mi%2C-%2B2270%2F-846-ft%5D&quot;&gt;Start to Corral Canyon [7.3 mi, +2270/-846 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#start-to-corral-canyon-%5B7.3-mi%2C-%2B2270%2F-846-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Race start was at 5 AM and Sunrise is around 7:00, so we spent the
first two hours or so in the dark. It was a bit warmer than expected,
at mid-50s and rainy.&lt;/p&gt;
&lt;p&gt;Pre-race I went back and forth on what to use for my headlamp: Most
days I use a Petzl &lt;a href=&quot;https://www.petzl.com/US/en/Sport/ACTIVE-headlamps/ACTIK-CORE&quot;&gt;Actik
Core&lt;/a&gt;
[450 lm, 75g], which is good enough most of the time, but I also have
a &lt;a href=&quot;https://www.lupinenorthamerica.com/Piko_X4_1900lm_LED_Headlamp.asp&quot;&gt;Lupine
Piko&lt;/a&gt;
[up to 1500lm, ~150g] which is substantially brighter and hence better
if the footing is dodgy. A second consideration is that I wasn&#39;t sure
whether I would need a headlamp at the end. Sunset is around 7:00 PM,
so if I was on target, I wouldn&#39;t need one at all, but if I was way
behind, then I might need it.  My last drop bag was Bulldog at mile
50ish, and I didn&#39;t want to have to lug a heavy headlamp up a big
hill, so I ended up using the Lupine from the start and then leaving
the Actik Core in my drop bag.&lt;/p&gt;
&lt;p&gt;This leg is a big climb, which went quite well. I ran nearly all of
this, walking stuff that was technical or extremely steep.  I
deliberately let the lead pack go so I wasn&#39;t tempted to run with
them, but felt like I was hitting my effort targets.  At this point I
mostly settled into the place I was going to be for the rest of the
race, getting passed by maybe 3 people the rest of the day and passing
one or two.&lt;/p&gt;
&lt;p&gt;I hit the first aid station well ahead of schedule&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/sob100k/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
at 1:23 (11:25/mi)
and feeling strong. I mostly just ran through this, grabbing some
fluid and moving on (0:24).&lt;/p&gt;
&lt;h2 id=&quot;kanan-road-%5B6.3-mi%2C-%2B1010%2F-1444-ft%5D&quot;&gt;Kanan Road [6.3 mi, +1010/-1444 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#kanan-road-%5B6.3-mi%2C-%2B1010%2F-1444-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next section is mostly rolling single track and fire road. It was
still dark at this point and I was definitely glad that I had brought
the brighter lamp because footing was a bit dodgy in the low
light. Other than that, this section was pretty straightforward and
still quite runnable. I was still feeling good when I got to the aid
station. I hit the bathroom, filled my bottles, and headed out quickly
(3:57).&lt;/p&gt;
&lt;h2 id=&quot;zuma-edison-ridge-1-%5B5.4-mi%2C-%2B1260%2F-997-ft%5D&quot;&gt;Zuma Edison Ridge 1 [5.4 mi, +1260/-997 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#zuma-edison-ridge-1-%5B5.4-mi%2C-%2B1260%2F-997-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This next section is a rolling descent on single track followed by a
moderate climb on fire road to the top of the ridge line ad.  The fire
road was pretty smooth and I was still keeping good pace here and ran
pretty much this whole section. Unfortunately, it&#39;s about here that I
started having pain in the inside of my left knee, especially on the
climbs. This felt kind of familiar from a previous injury which turned
out to be bursitus. Previously, it was bad enough that I couldn&#39;t run,
so I was naturally kind of concerned about this, but it wasn&#39;t yet bad
enough that I couldn&#39;t run.&lt;/p&gt;
&lt;p&gt;There&#39;s a moderate descent followed by a short climb into the aid
station, followed by a 3ish mile descent down into a lollipop with
Bonsall at the bottom, so I just quickly grabbed some more fluid and
headed out.&lt;/p&gt;
&lt;h2 id=&quot;bonsall-%5B3.4-mi%2C-%2B0%2F-1706-ft%5D&quot;&gt;Bonsall [3.4 mi, +0/-1706 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#bonsall-%5B3.4-mi%2C-%2B0%2F-1706-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As noted above, this next section is a 3.4 mile descent down to the
Bonsall aid station. Pretty much this whole thing is on fire road so I
was able to take it pretty fast (~8:35/mi). That part was good, but
there were two problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Every time it flattened out my knee started to hurt again.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I started to feel pretty unstable in the shoes I was wearing
(Salomon Pulsars). These are ultralight race shoes but they&#39;re
designed more for speed and forefoot striking and the narrow heel
is a bit unstable, at least for me.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I hit the bottom pretty worried about the knee and worried I might
have to drop out. Fortunately, I was able to borrow a vibrating foam roller and
work on the inside of the knee enough to be able to get moving again
(4:26).&lt;/p&gt;
&lt;h2 id=&quot;zuma-edison-ridge-2-%5B7.76%2C-%2B2910%2C-1184-ft%5D&quot;&gt;Zuma Edison Ridge 2 [7.76, +2910,-1184 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#zuma-edison-ridge-2-%5B7.76%2C-%2B2910%2C-1184-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There&#39;s a long climb out of Bonsall back to Zuma Edison Ridge. This is
actually two climbs, ~1600 ft, followed by a descent of around ~1000
ft and then another climb of ~1300 ft. It&#39;s not well shaded and
generally quite sandy and rocky and I was regretting being in
the pulsars, which have a pretty shallow tread and felt twitchy
on the rock. I ran a fair bit of
this but also power-hiked a lot as well. At this point I passed
someone who had torn by me on a downhill at the beginning. Amazingly this was
his first ultra and only his second race (the first was Pike&#39;s Peak
Marathon), so I was fairly impressed that he was moving so well.&lt;/p&gt;
&lt;p&gt;This whole section took almost two hours, but eventually I slogged my
way to the aid station. This is the halfway point with more than half
the climbing done, and I was at 6 hrs, so was feeling reasonably good
about things and my knee wasn&#39;t much worse than before. At this point
I figured it was time to start with Coke so I got some caffeine
onboard as well as some electrolyte tablets. I drank some Coke at
every aid station from here on in. This aid station was a bit long, in
part because I had to fish some lube out of my bag to help a guy named
Teague (sp?) who was getting some chafing (5:27).&lt;/p&gt;
&lt;h2 id=&quot;kanan-road-%5B5.4-mi%2C-%2B1037%2F-1283-ft%5D&quot;&gt;Kanan Road [5.4 mi, +1037/-1283 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#kanan-road-%5B5.4-mi%2C-%2B1037%2F-1283-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point we&#39;re just backtracking down the backbone trail to a
previous aid station. This means a ~600ft climb followed by a step
descend and some rolling terrain. At this point I noticed that Teague
was more or less keeping pace with me even though I was running about
half the climb and he was just hiking the whole thing. This made me
think that I had slowed down enough (or the terrain was harder) so
maybe I should be hiking more and I adjusted my strategy to hike
more of the uphills.&lt;/p&gt;
&lt;p&gt;Teague and another runner passed me on the downhill and I lost contact
with them. At this point things were starting to heat up and I was
definitely noticing some fatigue, which, combined with the shoes, was
making me especially tentative on the descents. I had
left a pair of Salomon S/LAB Ultra 3s (their long distance racing
shoe) in the Kanan drop bag specifically against this eventuality, so
I was able to change them out. Was still able to get out pretty fast
(3:03).&lt;/p&gt;
&lt;h2 id=&quot;corral-canyon-%5B6.4-mi%2C-%2B1453%2F-974-ft%5D&quot;&gt;Corral Canyon [6.4 mi, +1453/-974 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#corral-canyon-%5B6.4-mi%2C-%2B1453%2F-974-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We&#39;re still retracing our steps back to the first aid station, so this
is mostly on single track and generally uphill. It was still pretty
warm at this point and I was doing a lot of hiking.  The new shoes
were a lot more stable which was a definite improvement, and my knee
started to feel quite a bit better.&lt;/p&gt;
&lt;p&gt;I made my way to the aid station, which was a slog, but I also knew
that I had a long downhill ahead of me, so I mostly just needed to
make it that far. I grabbed some more electrolytes, etc. and headed
out (2:13).&lt;/p&gt;
&lt;h2 id=&quot;bulldog-%5B5.9-mi%2C-%2B486%2F-1946-ft%5D&quot;&gt;Bulldog [5.9 mi, +486/-1946 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#bulldog-%5B5.9-mi%2C-%2B486%2F-1946-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point I was really regretting not paying more attention to the
course: I had remembered there being a long out-and-back with a
descent to the Bulldog aid station, and then just turning around and
coming back up. It&#39;s actually more complicated: first you go up a mile
and ~400 ft, then down about 3 miles, and 2 miles flat, which is just
psychologically harder than a long descent. The only good
part here is that the aid station workers had told me it was 6.5 to
Bulldog but actually it was less than 6, as I was informed by a runner
coming up.&lt;/p&gt;
&lt;p&gt;I was pretty tentative on the descent. It&#39;s quite steep (about the
same as Kennedy Road in Sierra Azul) and a bit rocky and after
breaking my rib a few weeks ago I really didn&#39;t want to fall again.  I
did in fact catch my toe a few times, but fortunately stayed upright,
so at least that part was working. I definitely could have gone faster
on the downhill if I hadn&#39;t been trying to be careful.&lt;/p&gt;
&lt;p&gt;A nice part about an out and back is that you get to see everyone
coming the opposite way. I counted 11 people in front of me, though
based on talking to people at the finish, I may actually have ended
up in 11th (still waiting for the results).&lt;/p&gt;
&lt;p&gt;It&#39;s never a good idea to spend too much time in an aid station at the
bottom of a hill, so I tried to make it quick. However, I did need a
bunch of refills and I also took some time to work on my knee with
their impact massager just in case [3:51]. Moved out with 2 bottles (1l) of
sports drink and 1 bottle of water, figuring I would use the climb up
to hydrate. I also grabbed my light though I didn&#39;t really need it,
because I finished well before dark.&lt;/p&gt;
&lt;h2 id=&quot;corral-canyon-%5B5.8%2C-%2B1906%2F-495-ft%5D&quot;&gt;Corral Canyon [5.8, +1906/-495 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#corral-canyon-%5B5.8%2C-%2B1906%2F-495-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I ran the flat mile or two modestly hard and then just settled in for
the long hike up to the top. Pace was actually pretty good here
(14:00/mile for this leg) and this was just long. I did manage to get
almost all the fluid in, which is good because I had been getting
dehydrated (dark urine, etc.) Nothing much to report here other than
finally made it to the top and took the mile long downhill back to the
aid station.&lt;/p&gt;
&lt;p&gt;I saw a lot of people coming the other direction as I was coming up
and was able to tell them how far they had to go, which I had
certainly appreciated coming down. The closest person behind me was
about 2 miles back (and the person right in front of me was maybe .75
in front) so my position seemed pretty stable at this point. I saw the
guy doing his first ultra at maybe 2 miles down the hill, so it looked
like he had faded pretty badly, which wasn&#39;t too surprising as he had
looked tired earlier.&lt;/p&gt;
&lt;p&gt;Finally hit the aid station. Was definitely feeling a bit tired at
this point and sat for a minute to grab some more electrolyte pills,
adjust my shoes, fuel up, etc. (4:02).&lt;/p&gt;
&lt;p&gt;Was feeling pretty good about my time at this point because I left the
aid station at 11:26 and I figured 6ish miles would take 1:10-1:15.&lt;/p&gt;
&lt;h2 id=&quot;finish-%5B7.3-mi%2C-%2B833%2F-2277-ft%5D&quot;&gt;Finish [7.3 mi, +833/-2277 ft] &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#finish-%5B7.3-mi%2C-%2B833%2F-2277-ft%5D&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The way to the finish is some rolling single track followed by a
really long descent, mostly on fire roads (remember, we&#39;re
backtracking again, though I&#39;d done this section entirely in the dark
on the way out). Again, I was taking it really careful to avoid
falling -- I did actually misstep once and have to put a hand down but
it wasn&#39;t a real fall -- so I wasn&#39;t moving that fast in this section.&lt;/p&gt;
&lt;p&gt;After the long downhill, there&#39;s a small rise (maybe a mile and 250
ft) followed by another descent into the finish. This is another case
where I should have paid more attention to the actual GPS track
instead of what the aid station workers or the course description
said. My GPS was measuring 7.3 out and I thought it might have been
error because they assured me it was only 6.5, but sure enough its
actually 7.3. I had planned to push the last bit once I got off the
steep downhill but I didn&#39;t really anticipate it being more like 1.5-2
miles, so that was kind of an unpleasant surprise.  I was able to keep
pushing right to the end, though, so I still had some gas. It&#39;s a good
thing I decided to push, though, because with the extra distance if I
hadn&#39;t I might have missed out in 13:00.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Overall I think this went quite well. I was targeting 12-13 and got
significantly under 13. My real stretch goal was a little closer to
12:30, but given that this is a big improvement on my previous PR and
the uncertainties of the day plus my caution on the downhill this
seems like a good result.&lt;/p&gt;
&lt;p&gt;I might have started out a tiny bit hard, but given the cool
temperatures at the start and how I was feeling I think it was
generally reasonable. It&#39;s clear that the biggest place I lost time
was on the downhills, where I was just really being super careful. If
I hadn&#39;t been worried about my rib, I might have gone faster, but I
also think I need to spend more time learning to descend fast.
You&#39;re not just losing time; it&#39;s actually tiring to go slow.
I also am not sure I made quite the right tradeoffs between running
and hiking. I think I started hiking too late and then should
have run some stuff I ended up hiking. Given how I felt at the
end, I think I could have pushed some of the middle slightly harder,
especially if I hadn&#39;t had to worry about being too fatigued to
run downhill effectively.&lt;/p&gt;
&lt;p&gt;I probably would have been better off going with the Ultra 3s from the
beginning. The Pulsars are nice and light but I felt uncertain in them
and I think it may have also contributed to my knee pain. I could also
have worn the Sense Pro/4s that I used in Bigfoot and Yosemite. They
are slightly lighter than the Ultra 3s. I decided not to put them in
my bag because they don&#39;t have that much support I was worried about
my ankles and so wanted to be sure that if I needed to change I had
something supportive, but I could have just worn the Sense Pro/4s the
whole way. It looks like Salomon will be bringing out a whole line
of shoes with their new foam including a more stable Pulsar variant,
so this may not be a compromise I have to make in the future.&lt;/p&gt;
&lt;p&gt;My rib hurt for the whole second half of the race. I&#39;m not sure if
it&#39;s just not healing properly or if it&#39;s bruising from the way
the pack was sitting. I&#39;m leaning towards the second because it was
also hurting this way about 6-9 months ago. Will need to debug
this before my next event. Interestingly, it wasn&#39;t a problem
in Yosemite, so perhaps something about the way I packed the
front of the pack.&lt;/p&gt;
&lt;p&gt;Nutrition generally went well. I was targeting 1l of sports drink/hr
plus 100cal of gel or bar. Towards the end I was thirsty and not
hungry so was probably closer to 300 liquid cal/hr in sports drink and
coke. I was nauseated at the end but felt mostly OK during the
event. Should have drank a little more because, as noted earlier, I
was somewhat dehydrated. Did a pretty good job of keeping the aid
stations short, but could stand to shrink it a little more still.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/sob100k/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Segment&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Distance&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Elevation&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Time&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Pace&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;GAP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Corral&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7.29 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+2,270/-846 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:23:08&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:25/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;9:09/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;0:24&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Kanan&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;6.34 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,010/-1,444 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:09:08&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:54/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:05/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3:57&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Zuma&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.42 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,260/-997 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:01:10&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:17/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;9:46/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Bonsall&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3.43 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+0/-1,706 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;29:23&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;8:35/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;9:31/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4:26&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Zuma&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7.76 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+2,910/-1,184 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:51:34&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:22/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:37/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5:27&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Kanan&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.40 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,037/-1,283 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:06:25&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:19/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:58/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3:03&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Corral&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;6.37 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,453/-974 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:29:14&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:00/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:56/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2:13&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Bulldog&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.91 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+486/-1,946 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:03:26&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:44/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:21/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3:51&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Corral&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5.84 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1,906/-495 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:25:24&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:38/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:11/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4:02&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Finish&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7.32 mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+833/-2,277 ft&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;1:26:57&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:52/mi&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11:16/mi&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
&lt;a href=&quot;http://quicksilver-running.com/100k-map/&quot;&gt;Quicksilver 100K&lt;/a&gt; is actually
the same weekend and is run on many of the trails I regularly run on
in &lt;a href=&quot;https://www.openspace.org/preserves/sierra-azul&quot;&gt;Sierra Azul&lt;/a&gt;
but it just seems weird to pay to race on trails I usually train
on for free. &lt;a href=&quot;https://educatedguesswork.org/posts/sob100k/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I used &lt;a href=&quot;https://ultrapacer.com/&quot;&gt;Ultrapacer&lt;/a&gt; to estimate my
times, but their algorithm seems to underestimate my speed
on the climbs and overestimate on descents, as we&#39;ll see later. &lt;a href=&quot;https://educatedguesswork.org/posts/sob100k/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Privacy Preserving Measurement 4: Heavy Hitters</title>
		<link href="https://educatedguesswork.org/posts/ppm-heavy-hitters/"/>
		<updated>2021-10-15T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ppm-heavy-hitters/</id>
		<content type="html">&lt;p&gt;This is part IV of my series on Privacy Preserving Measurement (see
parts &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro&quot;&gt;I&lt;/a&gt;, &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies&quot;&gt;II&lt;/a&gt;, and
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio&quot;&gt;III&lt;/a&gt;). Today we&#39;ll be addressing techniques
for collecting so-called frequent strings (i.e., &amp;quot;heavy hitters&amp;quot;).&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/corrigan-gibbs&quot;&gt;Prio&lt;/a&gt;
and similar technologies mostly operate at the level of sets of
numeric values. As &lt;a href=&quot;https://educatedguesswork.org/ppm-prio/#computable-functions&quot;&gt;we&#39;ve seen&lt;/a&gt;, this can be surprisingly useful,
but doesn&#39;t work well when you want to collect non-numeric values. For
example, suppose you wanted to see what web sites people visited
commonly? You might, I suppose, make a list of the top million web
sites and have each client report the number of visits. This has
two obvious drawbacks:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It is very expensive because most of these values will be zeros
but you still need to send them (otherwise the server can tell
which sites you visited by the fact that they were sent).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It doesn&#39;t let you discover unknown values (i.e., new sites)
because they won&#39;t be on the list.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The general form of this problem is what&#39;s known as collecting
&amp;quot;heavy hitters&amp;quot;, i.e., frequent strings. Recently, we&#39;ve seen
a fair amount of work on this problem, which I&#39;ll sketch a bit
here.&lt;/p&gt;
&lt;h2 id=&quot;shamir-secret-sharing-designs-(star)&quot;&gt;Shamir Secret Sharing Designs (STAR) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-heavy-hitters/#shamir-secret-sharing-designs-(star)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first design I want to talk about is based on &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Shamir%27s_Secret_Sharing&amp;amp;oldid=1038837796&quot;&gt;Shamir secret sharing&lt;/a&gt;. Briefly, this is a system in which you
can break a secret &lt;em&gt;S&lt;/em&gt; into an arbitrary number of shares
such that any &lt;em&gt;N&lt;/em&gt; are sufficient to reconstruct it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-heavy-hitters/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;If we assume that each client &lt;em&gt;i&lt;/em&gt; has a value &lt;em&gt;S_i&lt;/em&gt; (e.g., the URL),
then the client computes a key &lt;em&gt;K_i&lt;/em&gt; as a deterministic function of
&lt;em&gt;S_i&lt;/em&gt; (e.g., by hashing it) and then sends the central server:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Secret-Share(K_i), Encrypt(K_i, S_i)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If the encryption is also deterministic, then the server
can group all of the shares that correspond to the same value.
Once it has collected enough shares (whatever the level of
secret sharing is) it can then reconstruct &lt;em&gt;K_i&lt;/em&gt; and decrypt
&lt;em&gt;S_i&lt;/em&gt; (which was of course shared by other people). This scheme,
originally described by Bittau et al. in
their paper on &lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/3132747.3132769&quot;&gt;PROCHLO&lt;/a&gt;,
is efficient and easy
to understand and has the nice property that it
doesn&#39;t require a trusted server. However, it has two problems. First, it&#39;s easy to
tell when two subjects have the same value, so even if you
don&#39;t know the value, you can group subjects that have
shared values. Second, if the values are not high entropy
(i.e., they are easy to guess) then the server can just
iterate over possible values and generate its own &lt;em&gt;K&lt;/em&gt; values
until it finds a matching encryption value.&lt;/p&gt;
&lt;p&gt;The first problem--telling which subjects have the same value--can be partly
addressed by having values submitted via a proxy.
This still reveals the distribution of values but not who
has them, as long as you don&#39;t have separate identifying
information, as I discussed in the &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies&quot;&gt;post&lt;/a&gt; on proxies.&lt;/p&gt;
&lt;p&gt;Davidson et al. &lt;a href=&quot;https://ui.adsabs.harvard.edu/abs/2021arXiv210910074D/abstract&quot;&gt;propose STAR&lt;/a&gt; to
partly address this second problem by having a separate (non-colluding) server which
generates a per-value &lt;em&gt;salt&lt;/em&gt; which can then be fed into
the hashing process. The way this works is that the
server computes what&#39;s called an &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Pseudorandom_function_family&amp;amp;oldid=1029021822&quot;&gt;oblivious pseudorandom function&lt;/a&gt;,
which is a function that allows the randomness server to
compute a deterministic function of the input (i.e., &lt;em&gt;S_i&lt;/em&gt;)
without seeing the actual value. This result is then used as
an input to the hashing process along with the &lt;em&gt;S_i&lt;/em&gt;.
The result is that the data collector can&#39;t just exhaustively
search all the potential &lt;em&gt;S_i&lt;/em&gt; values on its own offline, but has
to query the other server for each candidate value. This makes
it possible to learn about specific values--even if they
are uncommon--but hard to learn about unknown values.&lt;/p&gt;
&lt;h2 id=&quot;idpf-based-designs&quot;&gt;IDPF-based Designs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-heavy-hitters/#idpf-based-designs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The second class of design was introduced by &lt;a href=&quot;https://people.csail.mit.edu/henrycg/pubs/oakland21private/&quot;&gt;Boneh et al.&lt;/a&gt;
(including the authors who designed Prio). Conceptually it&#39;s analogous to Prio
in that the client splits up its value into two shares and sends one to
each server along with a proof. The servers collaborate to verify the
proof and then can return aggregated values. The actual encoding
of the values is more complicated and depends on something
called an &lt;em&gt;incremental discrete point function&lt;/em&gt; (IDPF) which
I won&#39;t describe here. (Incidentally, this system is badly
in need of a cool name, because &amp;quot;IDPF-based systems&amp;quot; doesn&#39;t
really roll off the tongue&amp;quot;).&lt;/p&gt;
&lt;p&gt;The way that this system is used in practice is that you
think of each value as just being a sequence of bits, and can
then query the system for the number of submissions with a
given &lt;em&gt;bit prefix&lt;/em&gt;. In other words, you can ask &amp;quot;how many
submissions have the first bit 1? How many have the first
bits 11? etc.&amp;quot; This lets you quickly discard regions of the
value space with low numbers of submissions and also
discover the prefixes which are common (i.e., heavy hitters).&lt;/p&gt;
&lt;p&gt;IDPF-based systems have a number of advantages. First, the
server only learns the top values and the query path that got there;
by contrast, in secret sharing
approaches the servers learn all values over the secret sharing
threshold. Second, like Prio it can be combined with demographic
values in the clear which can then be later used for crosstabs and the
like (with the same privacy caveats as with Prio). The price for this flexibility is rather higher computational
cost, though still within practical limits (they can find the top 200
strings out of 400,000 clients with two servers in a bit less than an
hour, so this is significantly slower than either STAR or Prio).&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot;&gt;Next Up &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-heavy-hitters/#next-up&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Everything I&#39;ve discussed so far gives exact answers. This is convenient
from the perspective of the data collector but can make the privacy
properties hard to analyze. In the next post I&#39;ll be talking
about randomized techniques that give approximate answers
(i.e. &amp;quot;differential privacy&amp;quot; of both the local and central varieties).&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The way this is done technically is by constructing a
polynomial of degree &lt;em&gt;N-1&lt;/em&gt; with the y-intercept being
the secret. Any &lt;em&gt;N&lt;/em&gt; distinct points are sufficient
to reconstruct the polynomial. In this case, the
polynomial has to be deterministically computed from
the secret as well. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-heavy-hitters/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Privacy Preserving Measurement 3: Prio</title>
		<link href="https://educatedguesswork.org/posts/ppm-prio/"/>
		<updated>2021-10-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ppm-prio/</id>
		<content type="html">&lt;p&gt;This is part III of my series on Privacy Preserving Measurement.  Part
&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro&quot;&gt;I&lt;/a&gt; was about conventional measurement techniques
Part &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies&quot;&gt;II&lt;/a&gt; showed how to improve those techniques
by anonymizing data on input. This post covers a set of
cryptographic techniques that use multiple servers working together
to provide aggregate measurements (i.e., a single value summarizing
a set of data points) without any server seeing individual
subjects&#39; data, anonymous or otherwise.&lt;/p&gt;
&lt;p&gt;Probably the best known--and easiest to understand--of these is
&lt;a href=&quot;https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/corrigan-gibbs&quot;&gt;Prio&lt;/a&gt;
designed by &lt;a href=&quot;https://people.csail.mit.edu/henrycg/&quot;&gt;Henry
Corrigan-Gibbs&lt;/a&gt; and &lt;a href=&quot;https://crypto.stanford.edu/~dabo/&quot;&gt;Dan
Boneh&lt;/a&gt;. Prio is already seeing
initial deployment by both
&lt;a href=&quot;https://blog.mozilla.org/security/2019/06/06/next-steps-in-privacy-preserving-telemetry-with-prio/&quot;&gt;Mozilla&lt;/a&gt;
and &lt;a href=&quot;https://www.abetterinternet.org/post/prio-services-for-covid-en/&quot;&gt;Apple/Google/ISRG&lt;/a&gt;. Prio allows for computing a variety of numeric aggregates
over input data, as described below.&lt;/p&gt;
&lt;h2 id=&quot;prio-overview&quot;&gt;Prio Overview &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#prio-overview&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic idea behind Prio is actually quite simple. The client
has some numeric value that it wants to report (say, household
income). It takes that value and &lt;em&gt;splits&lt;/em&gt; it into two shares
(I&#39;ll get to how that works shortly) and sends one share to each
of two servers. The sharing is designed so that only having
one share doesn&#39;t give you any information about the original
value. Every other client does the same and so now
each server has one share from each client. The servers then
&lt;em&gt;aggregate&lt;/em&gt; the shares so that they have a single value which
represents the aggregate of all the shares. They then send
the aggregate to the data collector who is able to reassemble
the aggregated shares to produce the aggregated value, as
shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/prio.png&quot; alt=&quot;Prio architecture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Because the shares are not individually useful as long as the
servers don&#39;t collude then each user&#39;s value is individually
protected. Note that I&#39;ve presented this as if the servers
and collector are separate, but it&#39;s just fine for the collector
to run one of the servers as long as the other server is run
by someone independent and trustworthy. They key privacy
guarantee is that the subjects only need to trust one of the
Prio servers. It&#39;s also possible to extend Prio to more servers,
with privacy being guaranteed as long as one of the servers
is honest, though this is somewhat more expensive.&lt;/p&gt;
&lt;h2 id=&quot;additive-secret-sharing&quot;&gt;Additive Secret Sharing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#additive-secret-sharing&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The actual math behind splitting the secret into two shares
is actually quite simple.&lt;/p&gt;
&lt;p&gt;If we denote client &lt;em&gt;i&lt;/em&gt;&#39;s value as &lt;em&gt;X_i&lt;/em&gt;, then &lt;em&gt;i&lt;/em&gt; computes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Generate a random number &lt;em&gt;R_i&lt;/em&gt;. This becomes share 1.&lt;/li&gt;
&lt;li&gt;Compute share 2 as: &lt;em&gt;X_i - R_i&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result is that server one gets all the R values and server
2 gets all of the true values - R. The aggregation function
is just addition, so server 1&#39;s aggregated share is&lt;/p&gt;
&lt;p&gt;&lt;em&gt;R_1 + R_2 + R_3 + ... R_n&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;and server 2&#39;s aggregated shares is&lt;/p&gt;
&lt;p&gt;&lt;em&gt;X_1 - R_1 + X_2 - R_2 + X_3 - R_3 + ... X_n - R_n&lt;/em&gt;.
If we add &lt;em&gt;these&lt;/em&gt; together, we get (rearranging the terms because
&lt;a href=&quot;https://www.youtube.com/watch?v=Vetg7vWitTU&quot;&gt;addition is commutative&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;X_1 + R_1 - R_1 + X_2 + R_2 - R_2 + X_3 + R_3 - R_3 ... X_n + R_n - R_n&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Cancelling out the matching terms, we get:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;X_1 + &lt;strike&gt;R_1 - R_1 &lt;/strike&gt; + X_2 + &lt;strike&gt;R_2 - R_2 &lt;/strike&gt; + X_3 + &lt;strike&gt;R_3 - R_3&lt;/strike&gt; ... X_n + &lt;strike&gt;R_n - R_n&lt;/strike&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;in other words, the sum of the original values. Magic, right?&lt;/p&gt;
&lt;p&gt;I&#39;m playing a bit loose with the math here: for technical reason the
values have to be non-negative integers and you have to do the math modulo
a prime number, &lt;em&gt;p&lt;/em&gt; (say 64 bits or so), but none of this affects the
reasoning above.&lt;/p&gt;
&lt;h2 id=&quot;bogus-data&quot;&gt;Bogus Data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#bogus-data&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This may all seem kind of obvious, and this kind of secret-sharing based
aggregation predates Prio. But I&#39;ve
omitted something really important: what happens if the client submits
bogus data? In a conventional system where the data collector sees the
raw data, they can just apply filters for data that appears to be bogus,
such as by discarding or clipping &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Outlier&amp;amp;oldid=1047968984&quot;&gt;outliers&lt;/a&gt;.
For instance, if someone reports that their yearly household income is a trillion dollars,
you would probably want to double check that. However, this isn&#39;t as simple
a matter with a system like Prio because neither server sees individual
values and the collector just sees the sum, at which point that&#39;s
too late: if they total household income of 1000 households is $1,000,030,000,000
then something is clearly wrong, but you don&#39;t know whose submission to discard.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
The complicated part of Prio is addressing this problem.&lt;/p&gt;
&lt;p&gt;There are actually two kinds of bogus inputs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Values which are false but plausible (e.g., that you make $10,000 a year more
than you do).&lt;/li&gt;
&lt;li&gt;Values which are simply ridiculous (e.g., that you make $1,000,000,000,000 a year).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s generally quite difficult to detect and filter out the first
kind of input because, as I say, it&#39;s plausible. What we want to do
is filter out the second kind of bogus input.&lt;/p&gt;
&lt;p&gt;The way Prio does this is by having the clients submit a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Zero-knowledge_proof&amp;amp;oldid=1046733796&quot;&gt;zero-knowledge proof (ZKP)&lt;/a&gt;. The details
of how ZKPs work are out of scope for this post, but the TL;DR is that
it&#39;s a proof that has two properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It proves that when you add the two shares together, the result
has certain properties. For instance, a submission
might prove that the reported household income is between
0 and 1,000,000.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;li&gt;It doesn&#39;t tell the verifiers anything else about the
result (hence &amp;quot;zero-knowledge&amp;quot;)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Because the proof has to apply to both shares and each side only
has their own share, the servers have to work together to verify
the proof (actually each side gets a share of the proof).&lt;/p&gt;
&lt;p&gt;This general idea isn&#39;t new to Prio, but what&#39;s new is that the
proofs are exceedingly efficient, which makes the idea far more
practical than previous systems.&lt;/p&gt;
&lt;h2 id=&quot;computable-functions&quot;&gt;Computable Functions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#computable-functions&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic unit of operation of Prio is addition, which
seems kind of limited, but actually is surprisingly powerful.
The trick is that you have to encode the data in such
a way that adding up the values computes the function
you want. Here are some examples (mostly taken from
the Prio paper):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Arithmetic mean&lt;/em&gt; is computed just by taking the sum
and dividing by the total number of submissions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Product&lt;/em&gt; can be computed by submitting the logarithm of
the values. The sum of those submitted logarithms is then the logarithm of the product of the values
(this is how &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Slide_rule&amp;amp;oldid=1041250169#Multiplication&quot;&gt;slide rules&lt;/a&gt; work).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Geometric mean&lt;/em&gt; can be computed from product.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Variance&lt;/em&gt; and &lt;em&gt;standard deviation&lt;/em&gt; can be computed by submitting
&lt;em&gt;X&lt;/em&gt; and &lt;em&gt;X^2&lt;/em&gt; and computing the average of each.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are also algorithms for boolean OR, boolean AND, MIN, MAX, and
set intersection. Somewhat surprisingly it is also possible to do
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Ordinary_least_squares&amp;amp;oldid=1046166857&quot;&gt;ordinary least squares
(OLS)&lt;/a&gt;
regression as well.&lt;/p&gt;
&lt;p&gt;While very powerful, this highlights an important difference between techniques
like Prio and the conventional &amp;quot;just collect it all&amp;quot; approach (and to
some extent the proxy approach): you need to know a lot in advance about
what measurement you are trying to take because you need to have the
data encoded in a way that is suitable for that measurement (or potentially
even invent a new encoding). This is a general pattern in privacy
preserving measurement: it&#39;s not just one technique but a set of
techniques designed for taking different kinds of measurements.&lt;/p&gt;
&lt;h2 id=&quot;crosstabs%2C-querying%2C-etc.&quot;&gt;Crosstabs, querying, etc. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#crosstabs%2C-querying%2C-etc.&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I&#39;ve presented the above as if the servers just aggregate any given
batch of client data and send it to the collector, but Prio and
similar systems can also be used in an interactive setting to look at
subsets of the data sets. Recall in the previous post where we had to
break up household income from the other values in order to avoid
de-anonymization attacks. With Prio we can do better by having
each subject submit their demographic data in the clear but the
household income with Prio. This produces a submission that looks
like this:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;[census tract, age, gender, nationality of each household member, Encrypted(income)]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The result is that each server ends up with a list of shares
tagged by demographic information. This allows them to compute
aggregates for subsets of the data set. For instance, you could
show the data broken up by household size, or the nationalities
of the members, or by both (using &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Contingency_table&amp;amp;oldid=1006059118&quot;&gt;crosstabs&lt;/a&gt;. This is also enough information to do
hypothesis tests like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Chi-squared_test&amp;amp;oldid=1040481389&quot;&gt;Chi-squared&lt;/a&gt;. The obvious cost, of course, is that the demographic
information isn&#39;t private. In some cases you could have replicated
this result with everything being private (e.g., if you wanted to
just do OLS&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;) but in general it&#39;s more flexible if some of the data is unencrypted.&lt;/p&gt;
&lt;p&gt;This brings us to the topic of operational mode: above I&#39;ve presented
things with what might be called a &amp;quot;push&amp;quot; model: the servers
compute the results and send them to the data collector. However,
for exploratory data analysis it&#39;s more convenient for the data collector
to drive this. For instance, you might ask for a breakdown by household
size and then later ask for a breakdown by household size &lt;em&gt;and&lt;/em&gt; age
to distinguish multigenerational households from families with a lot
of children. There are a number of potential ways this can happen.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The servers can expose some sort of API that lets the data collector
ask queries and get answers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The data collector can collect the shares from the subjects
(obviously they would be encrypted for the servers) and then just
ask the servers to aggregate a given subset of the reports.
This seems like an attractive model if the data collector
is also one of the servers.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these has some potential operational advantages; we&#39;re just
starting to see the development of publicly available Prio services
now (see this
&lt;a href=&quot;https://www.abetterinternet.org/post/introducing-prio-services/&quot;&gt;annnouncement&lt;/a&gt;
by ISRG, the people behind Lets Encrypt), so it will probably take
some time to get enough experience around the best practices here.&lt;/p&gt;
&lt;h2 id=&quot;input-manipulation-attacks&quot;&gt;Input Manipulation Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#input-manipulation-attacks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The privacy protection provided by Prio depends on the number of
individual data values being aggregated. Obviously, if you&#39;re
just aggregating one value it&#39;s the same as having value itself,
but if you are aggregating only a small number of values, then
the level of privacy is reduced. So, clearly for Prio to work
the servers have to insist on minimum batch sizes for the aggregation.
Even so, however, there can be attacks based on controlling the input data.&lt;/p&gt;
&lt;h3 id=&quot;sybil-attacks&quot;&gt;Sybil Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#sybil-attacks&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The simplest version is what&#39;s known as a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Sybil_attack&amp;amp;oldid=1045901532&quot;&gt;Sybil attack&lt;/a&gt;.
This is easiest to mount in the model where the data collector holds the
shares and just asks the servers to aggregate them for it. In this
case, it takes the one submission of interest and batches it up
with &lt;em&gt;batch_size - 1&lt;/em&gt; fake submissions where it knows the value.
It can then compute the submission&#39;s real value just by subtracting
its own fake values. This is a somewhat limited attack in that you
need to do an unreasonable number of aggregations in order to
learn a lot of people&#39;s data, but it&#39;s still concerning, especially
if you only want to learn about a few people.&lt;/p&gt;
&lt;h3 id=&quot;repeated-queries&quot;&gt;Repeated Queries &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#repeated-queries&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Even in cases where the server just provides API access--or have
some other anti-Sybil defense--it&#39;s still possible to isolate
individual submissions. The basic idea is that you divide the
data set into partially overlapping subsets and can then
learn information about the difference in the subsets. As a
contrived example, suppose we have the following data set
and a minimum batch size of 2.&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt;Name&lt;/td&gt;&lt;td&gt;Gender&lt;/td&gt;&lt;td&gt;Height (cm)&lt;/td&gt;&lt;td&gt;Salary ($)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;John Smith&lt;/td&gt;&lt;td&gt;M&lt;/td&gt;&lt;td&gt;160cm&lt;/td&gt;&lt;td&gt;[Encrypted]&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Bob Smith&lt;/td&gt;&lt;td&gt;M&lt;/td&gt;&lt;td&gt;162&lt;/td&gt;&lt;td&gt;[Encrypted]&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Jane Doe&lt;/td&gt;&lt;td&gt;F&lt;/td&gt;&lt;td&gt;155&lt;/td&gt;&lt;td&gt;[Encrypted]&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;...&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;If I aggregate all the salary values and then ask for all the
salary values for males, then I can subtract to find Jane Doe&#39;s
salary. Similarly, if I ask for all the salary values of people 160cm and
below I can learn John Smith&#39;s salary by subtracting it from
the total. Obviously, this kind of attack is harder to mount at scale when
batch sizes are large, but without any defenses there is still
some privacy risk, and work on addressing this is still in
early stages.&lt;/p&gt;
&lt;h2 id=&quot;standards-work&quot;&gt;Standards Work &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#standards-work&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;All of this technology is quite new, but there&#39;s already
a lot of interest. As I said above, there have already
been several deployments of Prio and there is currently
&lt;a href=&quot;https://github.com/abetterinternet/ppm-specification&quot;&gt;work&lt;/a&gt;
on bringing Prio and some other systems
to the IETF for standardization, so I expect we&#39;ll be
seeing a lot more activity soon.&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot;&gt;Next Up &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-prio/#next-up&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Prio and similar technologies mostly operate at the level of sets of
numeric values. As we&#39;ve seen above, this can be surprisingly useful,
but doesn&#39;t work well when you want to collect non-numeric values.  In
the next post I&#39;ll be covering some technologies that allow for
collecting arbitrary strings.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There&#39;s actually a more serious problem: supposing that I report
a household income of -$100,000, or, because this is modular
arithmetic, &lt;em&gt;p-100,000&lt;/em&gt;, this will produce a bogus output
in a way that isn&#39;t as easy to detect. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that if your household income was truly &amp;gt;$1,000,000/year
then you&#39;d just report this as $1,000,000 &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though even then, you really do want to look at the distribution
of the data to see if the OLS makes sense. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-prio/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Privacy Preserving Measurement 2: Anonymized Data Collection</title>
		<link href="https://educatedguesswork.org/posts/ppm-proxies/"/>
		<updated>2021-10-10T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ppm-proxies/</id>
		<content type="html">&lt;p&gt;In part &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro&quot;&gt;I&lt;/a&gt; of this series, we discussed the
conventional obvious way of taking measurements, which is to say
collecting a bunch of data and analyzing it locally. This is a fine
practice when the data itself isn&#39;t sensitive (e.g., outdoor
temperature readings from your own sensors), but is less good when
you&#39;re collecting data about people that they might consider
sensitive (and any data value probably has
&lt;em&gt;someone&lt;/em&gt; who considers it sensitive).
It&#39;s better to have some technical mechanism
which protects user data.&lt;/p&gt;
&lt;p&gt;For some measurements, you can simply not collect identifying
data. For instance, if you have a table at a local park doing
a survey, you can just not ask people for their names.  In many
cases, however, things are more complicated. One important example is
measurements taken from end-user devices (e.g., on-line surveys,
client-side telemetry, etc.) Because of the way the Internet works,
these reports naturally have the IP address associated with them and
in many cases that can be used to map back to the user&#39;s
identity. Even in cases where the software isn&#39;t running on the
subject&#39;s machine, there can still be risks. For instance, if you have
someone going door to door to collect information, the time of the
report plus the route the person takes can be used to infer
approximately which report corresponds to which subject.&lt;/p&gt;
&lt;p&gt;The most natural technical mechanism to address these issues is to
collect user data but anonymize it.
The basic idea behind anonymization is to separate the data
being collected from the identity of the user that is being
collected from so that you can work on the data without
knowing anything else about the user.&lt;/p&gt;
&lt;h2 id=&quot;central-anonymization&quot;&gt;Central Anonymization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#central-anonymization&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The simplest thing to do is just to collect all the data
centrally and then strip off the identifying information
and then (hopefully) discard the raw data. So you
start with a table like this with names:&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt;Name&lt;/td&gt;&lt;td&gt;Gender&lt;/td&gt;&lt;td&gt;Height (cm)&lt;/td&gt;&lt;td&gt;Salary ($)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;John Smith&lt;/td&gt;&lt;td&gt;M&lt;/td&gt;&lt;td&gt;160cm&lt;/td&gt;&lt;td&gt;100000&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Jane Doe&lt;/td&gt;&lt;td&gt;M&lt;/td&gt;&lt;td&gt;162&lt;/td&gt;&lt;td&gt;111005&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Bob Smith&lt;/td&gt;&lt;td&gt;F&lt;/td&gt;&lt;td&gt;155&lt;/td&gt;&lt;td&gt;95000&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;...&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;and end up with a table like this:&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt;Id&lt;/td&gt;&lt;td&gt;Gender&lt;/td&gt;&lt;td&gt;Height (cm)&lt;/td&gt;&lt;td&gt;Salary ($)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;1234&lt;/td&gt;&lt;td&gt;M&lt;/td&gt;&lt;td&gt;160cm&lt;/td&gt;&lt;td&gt;100000&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;5678&lt;/td&gt;&lt;td&gt;M&lt;/td&gt;&lt;td&gt;162&lt;/td&gt;&lt;td&gt;111005&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;910A&lt;/td&gt;&lt;td&gt;F&lt;/td&gt;&lt;td&gt;155&lt;/td&gt;&lt;td&gt;95000&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;...&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;In this table, I&#39;ve replaced the names with identifiers; you don&#39;t
need to do this but it&#39;s convenient to have some way to refer to each
record that isn&#39;t just the row number in the table. Obviously, you
have to select the identifier in such a way that it doesn&#39;t leak user
identity; see &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#identifier-selection&quot;&gt;below&lt;/a&gt; for more on this.&lt;/p&gt;
&lt;p&gt;At some level, this is just a policy mechanism and from the subject&#39;s
perspective depends on trusting the data collector to actually delete
the raw data, but it&#39;s significantly better than nothing. First, if
the data collector &lt;em&gt;is&lt;/em&gt; behaving as advertised, this prevents retrospective policy
changes where your data is collected under one regime and then the
data collector decides to use the data in a different way than they
said they would. Second, it&#39;s possible to have an independent audit
that the data collector is behaving as advertised, at least at one
point in time. With that said, it&#39;s obviously better to have technical
controls that don&#39;t depend on the data collector behaving correctly,
even at the initial point of data collection.&lt;/p&gt;
&lt;h2 id=&quot;anonymizing-proxies&quot;&gt;Anonymizing Proxies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#anonymizing-proxies&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The typical solution is to have an anonymizing proxy which
removes identifying information from reports.
Assume you have some piece of software (the &amp;quot;client&amp;quot;) which is collecting
the data. That client might be being operated directly
by the subject of the measurement or by some sort of field
agent doing the measurement (obviously the former is better).
In either case, the client has to be trusted (see &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software&quot;&gt;here&lt;/a&gt;
for more on this.
When the data is initially collected, the client encrypts it
for the data collector using some sort of
public key encryption scheme&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and then sends it to some proxy, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/anonymizing-proxy.png&quot; alt=&quot;Anonymizing Proxy&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The proxy strips off whatever identifying information (e.g., the IP
address) was originally associated with the report, thus preventing it
from being available to the collector. This leaves the data collector
with the same kind of data it would have had in the previous example.&lt;/p&gt;
&lt;p&gt;An anonymizing proxy is a good way to implement centralized
anonymization, but it&#39;s better if it&#39;s run by some sort of
trusted third party. In that case, the identity of the subject is protected
as long as the proxy and the data collector don&#39;t collude.
You can see this intuitively by realizing that the subject&#39;s
identity and their reported data are never available to the
same entity: the proxy just sees the identity and the encrypted
report and the collector sees the report (in both encrypted
and plaintext form) but never has the identity.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Note that in the diagram above, the proxy also attaches
some metadata to the submission. This is data added
by the proxy rather than by the client. For instance, the
proxy might indicate the rough geographic location of the
client as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internet_geolocation&amp;amp;oldid=1042251052&quot;&gt;derived from the client&#39;s IP address&lt;/a&gt;; this allows for geographic
segmentation without revealing the client&#39;s entire identity.
Additional metadata isn&#39;t necessary but it can be convenient
in some cases.&lt;/p&gt;
&lt;h2 id=&quot;attacks-on-anonymization&quot;&gt;Attacks on Anonymization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#attacks-on-anonymization&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Anonymization is a good start but unless done very carefully
it can yield significantly less privacy than expected.&lt;/p&gt;
&lt;h3 id=&quot;high-dimensional-data&quot;&gt;High-dimensional data &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#high-dimensional-data&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first problem is that if you have enough individual data
values it&#39;s often possible to successfully narrow down someone&#39;s
identity even if those individual values don&#39;t seem that
identifying (this is effectively the same problem as Web browser
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Device_fingerprint&amp;amp;oldid=1048778743&quot;&gt;fingerprinting&lt;/a&gt;).
For example, suppose you have a data set where you collect
the age, gender, and nationality for every member of a household,
as well as the &lt;a href=&quot;https://www.census.gov/data/academy/data-gems/2018/tract.html&quot;&gt;census tract&lt;/a&gt;.
Census tracts contain a few thousand people--maybe 1-2 thousand households--so any given combination of the above demographic variables is
likely to contain a fairly small number of households. This isn&#39;t
necessarily a problem if the data is boring, but what if the data set &lt;em&gt;also&lt;/em&gt; contains
something more sensitive, like household income? Now someone
who has access to the data set can look up people&#39;s incomes
based on (semi)publicly known demographic information.
This attack is called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Data_re-identification&amp;amp;oldid=1042277579&quot;&gt;de-anonymization&lt;/a&gt; and is a generic problem with
this kind of high-dimensional data set. There have been a number
of high-profile cases where anonymized data sets turned out
to be de-anonymizable, including &lt;a href=&quot;http://ggs685.pbworks.com/w/file/fetch/94376315/Latanya.pdf&quot;&gt;medical records&lt;/a&gt; (by Latanya Sweeney)
and &lt;a href=&quot;https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf&quot;&gt;Netflix viewing histories&lt;/a&gt; (by Narayanan and Shmatikov).&lt;/p&gt;
&lt;p&gt;There are a number of potential defenses against this kind of
de-anonymization--I&#39;ll be talking about adding random noise for
differential privacy in a future post--but one obvious thing to do is
to disaggregate the data set so that not all the information is available
together. For instance, maybe we don&#39;t need the ages
and nationalities of everyone in the household in order to to ask
questions about people&#39;s income, so we might have two submissions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;census tract, household size + household income&lt;/li&gt;
&lt;li&gt;census tract, age, gender, and nationality of everyone in the household&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This isn&#39;t perfect because there are going to be some edge
cases (there might only be one household with 9 people in it)
but it will substantially increase privacy.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
In order for it to work, however, it&#39;s absolutely critical that
the disaggregated data values can&#39;t be relinked. For instance,
you can&#39;t issue the same pseudonymous identifier to each
submission for the same subject or you&#39;ve just re-linked
the submissions you de-linked by disaggregating them.&lt;/p&gt;
&lt;p&gt;The second problem with disaggregation is that it reduces
your flexibility: if you have all the data in one table then it&#39;s
easier to ask new questions, but if it&#39;s disaggregated, then
that becomes harder. For instance, if we disaggregate household
income from the demographics of the individual participants,
then we can no longer ask if there is an influence of nationality
on household income. This isn&#39;t ideal because a lot of the value
of just collecting anonymized data is to preserve this kind of flexibility,
but unfortunately there&#39;s no really great way around it with
this kind of system.&lt;/p&gt;
&lt;h3 id=&quot;time-based-correlation-and-shuffling&quot;&gt;Time-based correlation and shuffling &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#time-based-correlation-and-shuffling&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you disaggregate the data for a given subject into multiple submissions
as discussed above but you then transmit them right after the other
then the value of the disaggregation goes down dramatically: the
server sees a stream of submissions come in and even if they
are mixed a little bit, it&#39;s usually pretty easy to put them
back together by lining up the overlapping values (census tract +
household size). It may be imperfect but it&#39;s likely to be pretty
coarse.&lt;/p&gt;
&lt;p&gt;It&#39;s of course possible for the client to shuffle the submissions
locally by waiting a random time between submissions, but it&#39;s
easier if the proxy does it. There are a number of possible
shuffling strategies with various tradeoffs of timeliness
and privacy. Tom Ritter has a good &lt;a href=&quot;https://ritter.vg/blog-cryptodotis-mix_and_onion_networks.html&quot;&gt;overview&lt;/a&gt; of this kind of technique for anonymous
messaging, where it&#39;s called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mix_network&amp;amp;oldid=1042030329&quot;&gt;mix networks&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;identifier-selection&quot;&gt;Identifier selection &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#identifier-selection&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As noted above, it&#39;s convenient to add a pseudonymous identifier to each
anonymized submission, but some care needs to be taken in generating
identifiers. Ideally, the identifier would be generated &lt;em&gt;after&lt;/em&gt;
anonymization, because then you can have high confidence that it doesn&#39;t
include any extra information that the data collector doesn&#39;t have
already. However, in some cases that&#39;s undesirable. For instance,
you might want to be able to connect multiple submissions by the
same client over time. The best way to do this is to generate the
identifiers randomly, because again this gives you high confidence
you aren&#39;t leaking information.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;In general, you really don&#39;t want the identifier to depend on
the client identity in any way because that&#39;s an opportunity
for compromise. In particular, hashing the client&#39;s true identity usually
does not provide protection against reversing the identifier unless the true
identity is itself very high entropy (i.e., there are so many possible
values that it is infeasible to try them all). The problem is that
if the hash function is public knowledge it&#39;s possible to just try
all the input identities until one matches. This is a mistake
that gets made over and over with &lt;a href=&quot;https://www.ftc.gov/system/files/documents/public_events/1223263/privacycon_emailprivacy_englehardt_0.pdf&quot;&gt;e-mail addresses&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A less bad but still not great option is for the proxy to compute
some sort of keyed pseudorandom function over the identifier with
the key being known only to the proxy. This doesn&#39;t have the same
problem of the &lt;em&gt;data collector&lt;/em&gt; exhaustively searching the identifier space
but it&#39;s still possible for the proxy to do so if it is later
compromised. In general, if you have a system which requires the proxy to assign
unique user identifiers to submitted data, it&#39;s probably worth rethinking
your design.&lt;/p&gt;
&lt;h2 id=&quot;proxy-implementations&quot;&gt;Proxy Implementations &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#proxy-implementations&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id=&quot;generic-proxies&quot;&gt;Generic Proxies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#generic-proxies&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There are already a number of generic proxying systems that people use for
private Internet access but which can also be used for anonymized
data collection. This includes both IP-level Virtual Private
Networks (VPNs) and Application-level proxy networks like &lt;a href=&quot;https://fpn.firefox.com/&quot;&gt;Firefox
Private Network&lt;/a&gt;, &lt;a href=&quot;https://support.apple.com/en-us/HT212614&quot;&gt;iCloud Private
Relay&lt;/a&gt;, or
&lt;a href=&quot;https://www.torproject.org/&quot;&gt;Tor&lt;/a&gt;. Private Relay and Tor have the
advantage that they include multiple hops so you need to extend even
less trust to any individual entity. Although these approaches are
useful, because they are generic they require some care to use successfully.&lt;/p&gt;
&lt;p&gt;As a concrete example, if you use a generic proxy then the proxy
can&#39;t shuffle your data because your interaction with the server
is interactive. Thus, the client has to do any shuffling
required which means multiple connections. This comes at a
performance and bandwidth cost. However, even if you do, things can go wrong.
For instance, the client is most likely connecting to the server
using &lt;a href=&quot;https://tools.ietf.org/rfcmarkup?doc=8446&quot;&gt;TLS&lt;/a&gt;, but
if it uses TLS session resumption
then there is a risk that the server can correlate multiple connections via
the TLS session ID/ticket. A related problem is that if the client is
also making non-anonymous connections to the server these might
be linkable to its submitted data.&lt;/p&gt;
&lt;p&gt;There are also some performance issues with setting up generic
connections for a single submission (for instance, the proxy
can&#39;t share one connection to the server), though those aren&#39;t necessarily
prohibitive.&lt;/p&gt;
&lt;h3 id=&quot;http-level-proxies&quot;&gt;HTTP-Level Proxies &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#http-level-proxies&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The IETF is currently in the process of standardizing an HTTP-level proxy
system called &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/OHAI&quot;&gt;Oblivious HTTP Application Intermediation&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.
Instead of being generic system, it&#39;s specifically designed for
lightweight submission of individual data. The client is configured
with the server&#39;s key and just encrypts a single HTTP message
for the server using &lt;a href=&quot;https://www.ietf.org/archive/id/draft-irtf-cfrg-hpke-07.html&quot;&gt;Hybrid Public Key Encryption (HPKE)&lt;/a&gt;. These can all be multiplexed over the
same server connection and don&#39;t have any linkage identifiers.
In principle, the proxy could also shuffle the incoming messages
as well to prevent time-based correlation, though that&#39;s not currently
in the specification.&lt;/p&gt;
&lt;h3 id=&quot;enclaves&quot;&gt;Enclaves &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#enclaves&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Bittau et al. (at Google/Google Brain) proposed a system
called &lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/3132747.3132769&quot;&gt;PROCHLO&lt;/a&gt;
which is essentially an anonymizing proxy built into an &lt;a href=&quot;https://software.intel.com/content/www/us/en/develop/documentation/sgx-developer-guide/top/enclave-programming-model.html&quot;&gt;SGX enclave&lt;/a&gt;.
Briefly, an enclave is a mechanism for having a sealed off section
of a microprocessor which (1) is not directly accessible from the
rest of the processor and (2) is able to attest to what software
is running on it. The idea here is that the proxy runs on the
enclave and therefore the client can be sure that it is handling
its submissions as advertised.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The cool thing about PROCHLO is that it&#39;s not supposed to need
a trusted third party in the loop: because the enclave guarantees
the software running on it, the data collector can just run the whole
thing in their own data center with the client checking the attestation
on the proxy. This is a good idea in theory, but unfortunately there have
been a number of &lt;a href=&quot;https://dl.acm.org/doi/abs/10.1145/3133956.3134038&quot;&gt;papers&lt;/a&gt;
&lt;a href=&quot;https://sgaxe.com/files/SGAxe.pdf&quot;&gt;attacking&lt;/a&gt; &lt;a href=&quot;https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-van_bulck.pdf&quot;&gt;SGX&lt;/a&gt;, so in practice it&#39;s
quite unclear whether this kind of enclave can be made secure against
an attacker who has control--especially physical control--of the computer it&#39;s running on,
so at least for now I&#39;d have more confidence in a proxy actually run by a
third party (though it wouldn&#39;t hurt to have it running in an enclave).&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot;&gt;Next Up &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#next-up&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Anonymizing proxies are a useful technique that have the virtue of
being both lightweight and easy to understand. In many cases they
are all you need, but they also require some care to use properly
and this can often give you either less flexibility or less privacy
than you would naively hope. Next up, I&#39;ll be talking about some
fancy cryptographic techniques that have the potential to offer
a better set of tradeoffs in some settings.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You need public key because every subject has to have
the same key or this lets you learn information about
which client is which. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that to really guarantee this, you also need the
traffic between the client and the proxy to be encrypted,
otherwise a sufficiently capable collector (i.e., one who could
see the incoming traffic to the proxy) could correlate
the incoming and outgoing reports. It&#39;s also important that
this encryption be &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Forward_secrecy&amp;amp;oldid=1048637885&quot;&gt;forward secret&lt;/a&gt; so that subsequent compromise or cheating by the
proxy can&#39;t re-link the submissions and their metadata. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Analyzing the extent to which it does is a nontrivial
exercise. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Technical note: if you have disaggregated submissions,
perhaps with keyed pseudorandom function of the submission type &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes, we started with the acronym OHAI and worked backwards. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The attestation is provided using a key burned into the
processor, so really you&#39;re trusting Intel. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-proxies/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Privacy Preserving Measurement 1: Background</title>
		<link href="https://educatedguesswork.org/posts/ppm-intro/"/>
		<updated>2021-10-07T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/ppm-intro/</id>
		<content type="html">&lt;p&gt;Depending on your point of view, we&#39;re in a golden age of big data
or a golden age of surveillance. Unfortunately, with the technology
we typically use, these are more or less the same thing: if you
collect data from a lot of people you&#39;re going to learn a lot
about them. While there &lt;em&gt;are&lt;/em&gt; applications where you
actually want to use people&#39;s individual data (e.g.,
targeted behavioral advertising&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;), in many cases you
just want to learn overall information about
the population, not information about specific individuals.&lt;/p&gt;
&lt;p&gt;As a concrete example, suppose that you want to take a survey to learn
the prevalance of some medical condition: for obvious reasons you
don&#39;t actually want to learn people&#39;s actual medical histories, you
just want to know how many people have disease X. But if you just go
around asking people, suddenly you actually have a bunch of incredibly
sensitive information. That information then needs to be protected--including
from yourself. This might seem counterintuitive, but it&#39;s important to
remember that in a lot of cases data is being collected by big
organizations with a lot of people in them, and you need to make
sure that nobody in the organization mishandles them.
Moreover, convincing people that that information will
be protected is essential to getting them to give it to you in the
first place; if people don&#39;t trust you, they won&#39;t tell you
the truth.&lt;/p&gt;
&lt;p&gt;The good news is that over the past few years there has been an
incredible amount of progress in what&#39;s generically called &lt;em&gt;privacy
preserving measurement (PPM)&lt;/em&gt; technologies that make it possible to
take measurements while also protecting people&#39;s privacy. This series
of posts attempts to provide an overview of these technologies. As a lead
in to that this post covers the traditional way that people
do things, which is basically to collect a pile of data and then
analyze it directly.&lt;/p&gt;
&lt;h2 id=&quot;types-of-measurement&quot;&gt;Types of Measurement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-intro/#types-of-measurement&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A good place to start is by asking about the kinds of measurements
you want to take. What I mean here isn&#39;t the data that you &lt;em&gt;collect&lt;/em&gt;
from users but rather the &lt;em&gt;output&lt;/em&gt; of the analysis that you
are trying to do. As we&#39;ll see later in this series, one of the
major challenges with PPM technologies is that they are good
for taking certain kinds of measurements and not others and so
you have to be really clear about what you are trying to do before
you start collecting data. To some extent this is true for any
kind of measurement as anyone who has ever done the kind of
science that requires a lot of data collection can tell you,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
but as a practical matter, if you have a bunch of raw data
in hand, there&#39;s usually quite a lot you can do. Indeed, it&#39;s
quite common to be able to take data collected for one purpose
and use it for an entirely one, as in many economics &amp;quot;natural
experiments&amp;quot;. This is much less true with PPM technologies.&lt;/p&gt;
&lt;p&gt;In this section, I go over some of the most common types of measurements.&lt;/p&gt;
&lt;h3 id=&quot;simple-aggregates&quot;&gt;Simple Aggregates &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-intro/#simple-aggregates&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Probably the simplest type of measurement you might want to take is
a population aggregate. For instance, you might want to ask
the average height or income of a population or the fraction
with some characteristic.&lt;/p&gt;
&lt;p&gt;The traditional way to do this is just to survey a bunch of people
(or thermometers, trees, whatever) and then collect their values
for the variable of interest. This is making it sound a lot easier than
it actually is because you usually don&#39;t want to measure the
whole population, so you instead end up taking a sample and getting
a representative sample can be quite difficult--as seems to
have been responsible for the severe polling errors in
&lt;a href=&quot;https://www.pewresearch.org/fact-tank/2016/11/09/why-2016-election-polls-missed-their-mark/&quot;&gt;recent US elections&lt;/a&gt;--but
at the end of the day you end up with a list of values.
Once you have this list, you can compute a number of different
aggregates, such as total, average (mean), median,
quantiles, standard deviation, etc.; basically, the kind of descriptive
statistics you would learn in a typical intro stats course.&lt;/p&gt;
&lt;h3 id=&quot;relationship-between-multiple-values&quot;&gt;Relationship Between Multiple Values &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-intro/#relationship-between-multiple-values&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The next most complicated kind of measurement captures the
relationship between multiple variables. For instance, we might be
interested in whether people who are taller make more money (spoiler
alert: &lt;a href=&quot;https://www.apa.org/monitor/julaug04/standing&quot;&gt;they do&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The standard techniques for this kind of analysis involve having
data which is grouped by subject. For instance, we might have
a table like the following:&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt;Gender&lt;/td&gt;&lt;td&gt;Height (cm)&lt;/td&gt;&lt;td&gt;Salary ($)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;M&lt;/td&gt;&lt;td&gt;160cm&lt;/td&gt;&lt;td&gt;100000&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;M&lt;/td&gt;&lt;td&gt;162&lt;/td&gt;&lt;td&gt;111005&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;F&lt;/td&gt;&lt;td&gt;155&lt;/td&gt;&lt;td&gt;95000&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;...&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;There are lots of things we can do with this kind of data.
Obviously, we can compute the descriptive statistics for
each variable that I mentioned above, but you can also
ask about the relationship between gender and income, the
relationship between gender and height, the relationship
between height and income, or between all three.
The nice thing about having this kind of data is that
you don&#39;t need to know in advance what kind of analysis you
want to run: as long as you have the raw data you can just
run it.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
As I said above, it&#39;s quite common in economics to use
data sets gathered for one purpose for researching
new questions. In addition this kind of raw data is
very useful for double-checking your statistical analysis:
there are lots of ways in which you can get
results that look fine but are actually kind of spurious
(this &lt;a href=&quot;https://janhove.github.io/teaching/2016/11/21/what-correlations-look-like&quot;&gt;post&lt;/a&gt; by Jan Vanhove does a good job of
making this case for correlation coefficients).&lt;/p&gt;
&lt;p&gt;For all these reasons, it&#39;s most convenient to have your
data in this kind of raw form and to gather more data
than you actually need; it&#39;s much better to have it and not
need it than find you need it later when it&#39;s too late to
gather it.&lt;/p&gt;
&lt;h3 id=&quot;everything-else&quot;&gt;Everything Else &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-intro/#everything-else&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Beyond the simple stuff I&#39;ve just listed, there is of course a
giant universe of other kinds of analysis, including things
like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build natural language models for machine translation&lt;/li&gt;
&lt;li&gt;Matching images to names for facial recognition&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Mass_surveillance_in_the_United_Kingdom&amp;amp;oldid=1038379624&quot;&gt;Video surveillance for criminal investigation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Collecting images of streets and houses for mapping and autonomous vehicles
as in &lt;a href=&quot;https://www.google.com/streetview/&quot;&gt;Google Street View&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I don&#39;t expect to be talking too much about privacy-preserving
versions of these applications in this
series of posts, in part because they use a different set of
techniques and in part because I don&#39;t know this material as well.&lt;/p&gt;
&lt;p&gt;There is, however, one important special case to cover, which is what&#39;s often called &amp;quot;heavy
hitters&amp;quot;.  The basic scenario is that each user has some open-ended
set of values (strings or sets of bytes or something) and you want to
collect the most common ones. There are a lot of applications for this
kind of measurement, such as discovering the most common URLs that
people are visiting. A key point here is that the values probably
aren&#39;t known in advance, so you need not just to know what ones are
popular but also to learn them.&lt;/p&gt;
&lt;p&gt;As with the rest of the measurements in this post, the easiest
thing to do with all these measurements is just to have everyone send their values
to some central data collector, where they can be processed.
This is especially useful in machine learning applications
where you might develop better algorithms later and want to
re-run them on the old data set.&lt;/p&gt;
&lt;h2 id=&quot;who-do-you-trust%3F&quot;&gt;Who do you trust? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/ppm-intro/#who-do-you-trust%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I keep repeating, the easiest
to do most kinds of data collection is just to gather as much
data as you can in raw form and then post process it at your
leisure. It&#39;s cheap, conceptually simple and easy to execute,
and it&#39;s very flexible in case you later discover that you
want to do a different kind of analysis or that investigage
some different question than you originally intended, all
of which happen quite often.&lt;/p&gt;
&lt;p&gt;The problem, of course, is that then the data collector now has this big pile
of potentially sensitive data which they have to protect.
From the user&#39;s perspective, the situation is even worse:
they have to trust you to manage that data in an appropriate
way. This might be fine if the data is about trees, but perhaps
less acceptable if it&#39;s about people&#39;s medical history.&lt;/p&gt;
&lt;p&gt;With a conventional system, data protection mostly comes down
to the data collector having some kind of policy about
how they handle the data. Typically this consists of
some combination
of internal anonymization (stripping user information, etc.)
and access controls. A good example of this
is the US Census, which collects a pile of potentially
confidential information but then promises to protect
it:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.census.gov/library/fact-sheets/2019/dec/2020-confidentiality.html&quot;&gt;&lt;img src=&quot;https://www.census.gov/content/census/en/library/fact-sheets/2019/dec/2020-confidentiality/jcr:content/map.detailitem.950.high.jpg/1578076216294.jpg&quot; alt=&quot;Census Confidentiality&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://educatedguesswork.org/posts/telco-data/&quot;&gt;problem&lt;/a&gt; with policy controls is that they
require the subjects of data collection to trust that they are correctly executed, not just now
but in the future. For instance, US Census data was
&lt;a href=&quot;https://www.scientificamerican.com/article/confirmed-the-us-census-b/&quot;&gt;used&lt;/a&gt;
&lt;a href=&quot;https://usatoday30.usatoday.com/news/nation/2007-03-30-census-role_N.htm&quot;&gt;to identify&lt;/a&gt;
Japanese-Americans for &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Internment_of_Japanese_Americans&amp;amp;oldid=1046914251&quot;&gt;internment&lt;/a&gt; during
World War II, after &lt;a href=&quot;https://www.scientificamerican.com/article/confirmed-the-us-census-b/&quot;&gt;repealing existing Census confidentiality protections&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Census Bureau surveys the population every decade with detailed
questionnaires but is barred by law from revealing data that could
be linked to specific individuals. The Second War Powers Act of 1942
temporarily repealed that protection to assist in the roundup of
Japanese-Americans for imprisonment in internment camps in
California and six other states during the war.&lt;/p&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;p&gt;Lawmakers restored the confidentiality of census data in 1947.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For these reasons, centralized data collection plus policy controls
isn&#39;t really an ideal answer. What we really want is technical
protections. Fortunately, we finally have the
technology to collect sensitive data in a way that (1) lets us do
significant amounts of useful analysis and (2) significantly improve
user privacy in a way that doesn&#39;t just depend on trusting the data
collector. In the next post, I&#39;ll be covering the simplest such
technique: anonymizing proxies.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though this doesn&#39;t necessarily mean &lt;em&gt;learning&lt;/em&gt; people&#39;s
individual data. &lt;a href=&quot;https://blog.mozilla.org/en/mozilla/the-future-of-ads-and-privacy/&quot;&gt;Privacy Preserving Advertising&lt;/a&gt;
attempts to use people&#39;s data without learning it. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
One of the most common experiences is collecting your
data, finding out that you&#39;ve done something wrong, and then
having to collect it again.... and again. &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that you have to be quite careful if you
try to ask too many questions out of the same
data set. Each time you run a statistical test,
there is a certain risk of a false positive result
(the technical jargon here is &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Type_I_and_type_II_errors&amp;amp;oldid=1041362519&quot;&gt;Type 1 error&lt;/a&gt;), so if you try out a lot of different things,
there&#39;s a risk that you&#39;re just going to get
false positive results (see &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Data_dredging&amp;amp;oldid=1046525999&quot;&gt;p-hacking&lt;/a&gt;).
 &lt;a href=&quot;https://educatedguesswork.org/posts/ppm-intro/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Fantastic memory issues and how to fix them</title>
		<link href="https://educatedguesswork.org/posts/memory-safety/"/>
		<updated>2021-09-22T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/memory-safety/</id>
		<content type="html">&lt;p&gt;Last week everyone with an Apple device got told they needed to install
an emergency update to defend themselves against a &amp;quot;zero-click exploit&amp;quot;
that was apparently &lt;a href=&quot;https://support.apple.com/en-us/HT212805&quot;&gt;being used in the wild&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ATTENTION&lt;/strong&gt;: If you aren&#39;t on the latest software, stop reading this and update &lt;strong&gt;right now&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The update has fixes for two issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CVE-2021-30860 -- an integer overflow in the PDF parser.&lt;/li&gt;
&lt;li&gt;CVE-2021-30858 -- a user-after-free vulnerability in WebKit (Apple&#39;s Web engine)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Apparently both of these can lead to what&#39;s called &lt;em&gt;remote code execution&lt;/em&gt; (RCE)
which means pretty much what it sounds like -- the attacker gets to run their own
code on your device -- and were being actively exploited.
It&#39;s not just Apple either:
On the same day Maddie Stone from Google Project Zero &lt;a href=&quot;https://twitter.com/maddiestone/status/1437512920770834434?ref_src=twsrc%5Etfw&quot;&gt;tweeted&lt;/a&gt; about Chrome fixing two vulnerabilities of their own, an out-of-bounds
write and a use-after-free (thanks to &lt;a href=&quot;https://www.helpnetsecurity.com/2021/09/14/cve-2021-30860/&quot;&gt;HelpNet&#39;s article&lt;/a&gt;
for the links here.)&lt;/p&gt;
&lt;p&gt;While the precise details of these issues vary, they&#39;re
all (with the potential exception of the integer overflow, but probably
that too) what&#39;s called &amp;quot;memory safety&amp;quot; issues. These are generally
quite bad and often lead to RCE, and unfortunately they&#39;re also quite
common; pretty much any complicated piece of systems software
regularly has to release fixes for
memory safety stuff. The rest of this post provides an overview
of what&#39;s going on and what we can do about it.&lt;/p&gt;
&lt;h2 id=&quot;what&#39;s-a-computer&#39;s-memory-anyway%3F&quot;&gt;What&#39;s a computer&#39;s memory anyway? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#what&#39;s-a-computer&#39;s-memory-anyway%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In order to understand what&#39;s going on, you need to first have
some idea of how computer software works. Feel free to skip
this section if you already know this stuff. At the very highest
level, a computer looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/computer-memory.png&quot; alt=&quot;Abstract diagram of computer&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;central processing unit&lt;/em&gt; (CPU) is responsible for actually
running the programs you load onto the computer (e.g., adding
numbers up, drawing stuff on the screen, etc.) It does that
by reading those programs from the computer&#39;s memory. A program
is just a series of instructions that the CPU should follow.&lt;/p&gt;
&lt;p&gt;The computer&#39;s memory is effectively just a giant table of
numbers. Each location in the table is pointed at by a memory
&lt;em&gt;address&lt;/em&gt; which is just a number that tells you where it is
in the table. Addresses are laid out in the obvious way
so that the stuff in address 2 is right after the stuff
in address 1, etc. For instance, we might have the following
program, with the numbers on the left being the memory
address and the text being the thing to do (the technical
term is &amp;quot;instruction&amp;quot;). Note that I&#39;m taking a lot of liberties
here: the instructions aren&#39;t in English but are just
numbers and each memory address is the same size, so
you couldn&#39;t fit &amp;quot;Hello&amp;quot; and &amp;quot;Goodbye&amp;quot; in the same size
place, but we can ignore those issues for now.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;001    Write(&amp;quot;Hello&amp;quot;)
002    Write(&amp;quot;Goodbye&amp;quot;)
003    Exit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Normally, the instructions are read in sequence, so this
program would print out &amp;quot;Hello&amp;quot;, then print out &amp;quot;Goodbye&amp;quot;, and
then exit.&lt;/p&gt;
&lt;p&gt;This is all fine if you only want to write really boring
programs, but if you want to write interesting programs,
you&#39;re obviously going to need some more stuff. In particular,
you&#39;re going to want to do two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Store temporary data (e.g., pictures, text, etc.) somewhere
so you can work on it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Have the program exhibit conditional behavior rather than
always running the same instructions in the same order
every time.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For example, here&#39;s another simple program that counts from
1 to 100.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;001    Store(50, 0)
002    Write(Data(50))
003    If Data(50) = 100 go to 6
004    Store(50, Data(50) + 1)
005    go to 2
006    Exit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a little harder to read because I was struggling a bit with
the notation. The basic idea here is to use memory location 50 as a
counter.  Line 1 says &amp;quot;put the value 1 in memory location 50&amp;quot;.
&lt;code&gt;Data(50)&lt;/code&gt; refers to whatever is in location 50 and so line 2 says
&amp;quot;Write whatever is in 50&amp;quot;. Line 3 checks to see if the counter is at
100 and if so goes to line 6 which eventually exits the
program. Otherwise, line 4 increments the counter by 1. Then the
program goes to line 5 which sends it
back to line 2.&lt;/p&gt;
&lt;p&gt;The most important thing to note here is that the program and its
data share the same memory, so whether a piece of memory is a program
or data is just a matter of convention. For instance, if the
computer gets hit by an ill-timed cosmic ray and line 5 gets
changed to &lt;code&gt;go to 50&lt;/code&gt; then the CPU would diligently jump
to memory address 50 and try to interpret whatever was
there as an instruction (remember, it&#39;s all numbers anyway),
This is what&#39;s called
a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Von_Neumann_architecture&amp;amp;oldid=1030982626&quot;&gt;Von Neumman Architecture&lt;/a&gt;
and it&#39;s how nearly all computers work. The alternative,
in which programs and data are separate, is called a
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Harvard_architecture&amp;amp;oldid=1044132498&quot;&gt;Harvard Architecture&lt;/a&gt;.
It&#39;s unusual to have a modern computer with a Harvard Architecture,
but actually that&#39;s kind of bad for reasons we&#39;re about to see.&lt;/p&gt;
&lt;p&gt;Although from a hardware perspective the memory is undifferentiated,
there is a conventional way to lay things out, as shown in this
diagram I borrowed from Geeksforgeeks:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdncontribute.geeksforgeeks.org/wp-content/uploads/memoryLayoutC.jpg&quot; alt=&quot;C memory architecture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;To orient yourself, address zero is at the bottom of the diagram
and higher addresses are at the top. The program is actually
split up into two pieces: the program itself (&amp;quot;the &lt;em&gt;text&lt;/em&gt; segment&amp;quot;)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and its data (&amp;quot;the &lt;em&gt;data&lt;/em&gt; segment).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
There are also two different parts of memory where the program&#39;s
data is stored call the &amp;quot;stack&amp;quot; and the &amp;quot;heap&amp;quot;. I&#39;ll get to them
shortly.&lt;/p&gt;
&lt;h2 id=&quot;remote-code-execution&quot;&gt;Remote Code Execution &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#remote-code-execution&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The core thing that allows for memory safety issues to arise
is this intermixing of the program and its working data
into the same memory region,
Here is a very silly program which shows what I am talking
about:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;001 Read(10)
002 go to 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What does this program do? It reads some stuff from somewhere (the
Internet?!!!) into memory location 10 (again, I&#39;m pretending that
these can just be arbitrary sized) And then it goes to (the technical
term here is &amp;quot;jumps&amp;quot;) to location 10 and then starts executing
whatever it read--from the Internet!--as a program instead of
whatever program was originally loaded. And if that program
that you just loaded does something dangerous like deleting
all your files or causing your computer to explode, well that&#39;s
what happens. This is what we mean when we say &amp;quot;remote code
execution&amp;quot;: your computer is executing the code that some
remote person sent it.&lt;/p&gt;
&lt;p&gt;At this point you could be forgiven for asking why anyone would
write a program that did this, and even though I said it
was silly, this is actually something Web browsers do a lot:
Web pages are just little (or big) programs and the browser&#39;s
job is to load and run those programs, although for obvious
reasons they don&#39;t do it this directly.
&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
But what does happen
is that the program you are running has &lt;em&gt;defects&lt;/em&gt; (&lt;em&gt;vulnerabilities&lt;/em&gt;)
that allow the attacker to execute code on your computer
without having such an obvious problem as this program.&lt;/p&gt;
&lt;p&gt;There are a large number of possible ways to have this kind of
defect, and I&#39;m just going to explain one, though it&#39;s a classic:
the &lt;em&gt;buffer overflow&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id=&quot;buffer-overflows&quot;&gt;Buffer Overflows &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#buffer-overflows&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Unlike the trivial programs I&#39;ve been showing
above, real programs are written as a set of &lt;em&gt;subroutines&lt;/em&gt;,
which is just the jargon for an independent piece of code that
that does something and can run on its own. For instance, suppose
that I want to write a simple program that reads strings
from the keyboard and then echoes them back. In the C language,
this program might look something like this (though of course
it&#39;s eventually converted into a set of machine-level instructions
as described above).&lt;/p&gt;
&lt;pre class=&quot;language-clike&quot;&gt;&lt;code class=&quot;language-clike&quot;&gt;void &lt;span class=&quot;token function&quot;&gt;read_print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;   char temp&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;   &lt;span class=&quot;token function&quot;&gt;gets&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;temp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;   &lt;span class=&quot;token function&quot;&gt;puts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;temp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;puts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Enter string 1&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;read_print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;puts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Enter string 2&#92;n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;read_print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;read_print()&lt;/code&gt; is a subroutine (thouch C calls them &amp;quot;functions).
It can be invoked (&amp;quot;called&amp;quot;) from anywhere else in the program and
will do whatever it&#39;s supposed to do and then &amp;quot;return&amp;quot; back to
where it was. The syntax for this is just to write it with
parentheses, like &lt;code&gt;read_print()&lt;/code&gt;.
So, in this case, when you call &lt;code&gt;read_print()&lt;/code&gt; the
first time, it reads a string, then prints it, and then goes to the
next line, which prints &lt;code&gt;Enter string 2&lt;/code&gt;. By the way, &lt;code&gt;gets()&lt;/code&gt;
and &lt;code&gt;puts()&lt;/code&gt; are also functions: &lt;code&gt;gets()&lt;/code&gt; reads from
the keyboard and &lt;code&gt;puts()&lt;/code&gt; writes to the output. These
functions are built into the C standard library.&lt;/p&gt;
&lt;p&gt;Unfortunately, this program has a bug. In order to use the &lt;code&gt;gets()&lt;/code&gt;
function, you need to tell it where you want it to stuff whatever
it is reading from the keyboard, which means passing it some memory
location. The line line &lt;code&gt;char temp[8];&lt;/code&gt; allocates a region
of memory (a buffer) of size 8 and then we pass it to &lt;code&gt;gets()&lt;/code&gt;.
But what happens if someone types more than 8 characters at the
keyboard?&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Nothing good! Because &lt;code&gt;gets()&lt;/code&gt; does not know how long
the memory region is, it just keeps writing stuff, overwriting
whatever happens to be there already. This is what&#39;s called a
&lt;em&gt;buffer overflow&lt;/em&gt; and it&#39;s really bad news. Buffer overflows
are one of the main kinds of vulnerability that eventually
leads to program compromise.&lt;/p&gt;
&lt;h3 id=&quot;smashing-the-stack&quot;&gt;Smashing the Stack &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#smashing-the-stack&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I don&#39;t want to get into too much detail about how to exploit
a buffer overflow, but I&#39;m just going to give one example of
how this can happen, the classic &lt;em&gt;smashing the stack&lt;/em&gt;
attack described by Aleph One in &lt;a href=&quot;https://www.eecs.umich.edu/courses/eecs588/static/stack_smashing.pdf&quot;&gt;Smashing the Stack for Fun and Profit&lt;/a&gt;, setting off two waves, one of exploitation
and one of papers describing how to do X &amp;quot;for fun and profit&amp;quot;.
In order to understand the stack overflow, you need to
understand a little more about how programs are laid out
in memory. When you make a function/subroutine call, the
computer needs to remember which memory address to
go back to when the function completes (&amp;quot;returns&amp;quot;). In order to do
this, it stores that information in the stack area of memory. For instance,
suppose I have the trivial program:&lt;/p&gt;
&lt;pre class=&quot;language-clike&quot;&gt;&lt;code class=&quot;language-clike&quot;&gt;void &lt;span class=&quot;token function&quot;&gt;bar&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;puts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;I am bar&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;void &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;puts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Start of foo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;bar&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token function&quot;&gt;puts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;End of foo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;// Will go in memory address Y&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;puts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Before foo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token function&quot;&gt;puts&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;After foo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;       &lt;span class=&quot;token comment&quot;&gt;// Will go in memory address X&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note: the &lt;code&gt;//&lt;/code&gt; means a &amp;quot;comment&amp;quot;, a section of code that doesn&#39;t
do anything, it&#39;s just there to help you know what&#39;s
going on.
At the start of the program, the stack is more or less empty&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
Then when we call &lt;code&gt;foo()&lt;/code&gt; the stack now contains the
address of the line of code after we called foo, which is
to say &lt;code&gt;X&lt;/code&gt;, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+-+-+-+-+-+-+-+-+
|       X       |
+-+-+-+-+-+-+-+-+
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&#39;ve done something a little sneaky here, which is that I&#39;ve drawn
this to scale, with the address taking up 8 &amp;quot;units&amp;quot; with
each little &lt;code&gt;+-+&lt;/code&gt; representing one unit. This matches
the size of addresses that most modern machines have. Each
little &lt;code&gt;+-+&lt;/code&gt; represents one unit.
Then when &lt;code&gt;foo()&lt;/code&gt; calls &lt;code&gt;bar()&lt;/code&gt;, we have to add the
address of the line after the call to &lt;code&gt;bar()&lt;/code&gt; on
the end, so it looks like this, with memory addresses
going up as we go left to right.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       X       |       Y       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here&#39;s the thing, though: the stack isn&#39;t just used to store
the return address, it&#39;s also used to store memory that&#39;s
just used by the function being executed. So, if we go back
to our read/print program above, when the computer calls
the &lt;code&gt;read_print()&lt;/code&gt; function, the stack looks
like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   buffer      |1 2 3 4 5 6 7 8|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       ^
                  return address
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we read a short string into the buffer, say &amp;quot;hello&amp;quot;, then
we get something like this, with each character filling one
memory unit for a total of 5.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|h e l l o      |1 2 3 4 5 6 7 8|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       ^
                  return address
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However if instead the user types in something
longer, like &amp;quot;hello world!&amp;quot;, then the computer will just
happily keep writing into the return address because it
doesn&#39;t know how long the buffer is, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|h e l l o   w o|r l d ! 5 6 7 8| 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       ^
                  return address
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When the function is finished, instead of jumping
back to where it&#39;s supposed to (the line of code right afer
the function was called), it will jump to whatever the attacker
has stuffed in the return address, which, obviously, is bad.
For extra credit, the attacker can use the buffer overflow
to write the code they want executed into memory somewhere
(maybe in the buffer itself, or maybe after the return pointer)
and then jump right to it. Mission accomplished:
remote code execution.&lt;/p&gt;
&lt;h2 id=&quot;how-did-this-happen%3F&quot;&gt;How did this happen? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#how-did-this-happen%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The core problem here is actually quite simple:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;C is a horrifically dangerous language that encourages you to write bad code (C++, too)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The situation is actually fractally bad. At the first level, we have the fact
that a lot of the original C library functions are awful.
The design of the &lt;code&gt;gets()&lt;/code&gt; function is a great example here. Because &lt;code&gt;gets()&lt;/code&gt;
has no way of knowing how much memory it has to work with, there is simply
no way of using &lt;code&gt;gets()&lt;/code&gt; safely. Here&#39;s what the manual page for &lt;code&gt;gets()&lt;/code&gt; says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The gets() function cannot be used securely.  Because of its lack of
bounds checking, and the inability for the calling program to reliably
determine the length of the next incoming line, the use of this function
enables malicious users to arbitrarily change a running program&#39;s functionality
through a buffer overflow attack.  It is strongly suggested
that the fgets() function be used in all cases.  (See the FSA.)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The replacement function recommended here &lt;code&gt;fgets()&lt;/code&gt; is slightly better, in that
it allows you to pass a length value, but of course it&#39;s possible to get the
length wrong at which point you&#39;re back in the soup. And because
code is written by people, if you have a lot of coude you probably have
a lot of bugs.&lt;/p&gt;
&lt;p&gt;Even if you eliminate the unsafe library functions -- and there are
checkers you can get which will detect them and stop you -- C is still
really hard to use correctly. Another source of problems is that C actually
requires you to manage memory directly. Suppose that you want to
read some data from the network but you don&#39;t know how long it&#39;s going
to be. For instance, it might be in &amp;quot;length-value&amp;quot; format where the
first thing you read is the length and then you read that much data.
In this situation, the idiom we used above of just having a fixed-size
buffer won&#39;t work: any value you choose will be too short some of the
time&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
and if you choose a really long value you are wasting memory.&lt;/p&gt;
&lt;p&gt;In C, the way you handle this is that you can &lt;em&gt;allocate&lt;/em&gt; a block
of memory of a given size (this goes in the heap region) and then
use it to store data in. When you&#39;re done with the memory, you
then &lt;em&gt;free&lt;/em&gt; (deallocate) the memory so that it can be allocated
again for some other purpose. Now, what happens if you have
a bug in your program where you use the memory after it is freed
(this happens surprisingly often in complex programs)? The answer
is that you have what&#39;s called a &lt;em&gt;use-after-free&lt;/em&gt; bug (remember
I said that above?) and this can often be exploited to compromise
the program.&lt;/p&gt;
&lt;p&gt;Unfortunately, because C is &lt;em&gt;fast&lt;/em&gt; and &lt;em&gt;portable&lt;/em&gt; (i.e., you can write
C for a lot of different kinds of computers), it is used all over
the place and so we have giant piles of code written in C or its
descendent, C++. Much of this code has undetected memory safety issues
just waiting to be exploited, which brings us back to our main
story.&lt;/p&gt;
&lt;h2 id=&quot;fixing-memory-safety-issues&quot;&gt;Fixing Memory Safety Issues &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#fixing-memory-safety-issues&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There are a number of different approaches to fixing memory safety
issues, some of which have been more successful than others.&lt;/p&gt;
&lt;h3 id=&quot;fix-all-the-issues&quot;&gt;Fix all the issues &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#fix-all-the-issues&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One thing you might think you could do is just fix all the
defects, potentially with the assistance of tooling
that detected them. Unfortunately, while fixing any particular
defect is generally difficult, there are such a large number
of defects and they are so difficult to find that I don&#39;t think
anyone thinks that this is a practical approach.&lt;/p&gt;
&lt;h3 id=&quot;memory-safe-languages&quot;&gt;Memory-Safe Languages &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#memory-safe-languages&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The second major approach is to write in a &amp;quot;memory-safe&amp;quot; language.
For a long time, C/C++ (and on Apple platforms, Objective C)
were the only game in town for systems programming.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
While C++ and Objective C have some mechanims to let you write
somewhat safer code, it&#39;s still quite possible to shoot yourself
in the foot unless you&#39;re very careful to follow a
strict subset of the language (and arguably not even then).&lt;/p&gt;
&lt;p&gt;There certainly have been languages that let you write
&amp;quot;memory-safe&amp;quot; code in which it was difficult or impossible to
write the kind of defects I was showing above. Typically the
way this works is that you&#39;re not allowed to handle raw
memory like you do in C. For instance, instead of just
having a &amp;quot;block of memory of unknown size&amp;quot; you might have a &amp;quot;higher-level&amp;quot;
abstractions like &amp;quot;block of contiguous memory of size X&amp;quot;
and the language would forbid you from reading or writing
outside of that block. However, for a variety of reasons
(principally rooted in real or fake performance concerns),
these languages have never really taken off for systems
programming until relatively recently. Probably the closest is
Java, which saw a bunch of use for enterprise software but
wasn&#39;t really that successful for end-user applications like
operating systems, word processors, and the like.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Over the past 5-10 years, however, two new languages have emerged
that are getting real traction in this space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://golang.org/&quot;&gt;Go&lt;/a&gt; designed by Google&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.rust-lang.org/&quot;&gt;Rust&lt;/a&gt; originally designed by Mozilla but now
maintained by the Rust community.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both of these are memory safe and occupy a similar niche to
C/C++. with Rust probably being a closer match and Go being a little
more like Java. It&#39;s credible to write a new piece of systems
software in either language and even to integrate it with
a code base written largely in C/C++ (this part works somewhat better with
Rust than with Go).&lt;/p&gt;
&lt;p&gt;While a very important tool, Rust and Go aren&#39;t really a general
solution because we have huge amounts of code already written in
C and C++ and it&#39;s very expensive to rewrite. There&#39;s been a lot
of energy in the Rust community behind this kind of rewrite
(so much that &amp;quot;rewrite it in Rust&amp;quot; is a catchphrase) but realistically
and while there have been some successful projects, it&#39;s hard to
see any major software system being replaced with a Rust
version any time soon, though of course we might see new
replacement programs written in Rust displace their
older counterparts just through the normal process of new
product/software development. As a practical matter
this means that we&#39;re going to be living with software
with memory safety issues for quite some time. For this
reason, there has been a lot of focus on containing the damage.&lt;/p&gt;
&lt;h3 id=&quot;anti-rce-countermeasures&quot;&gt;Anti-RCE Countermeasures &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#anti-rce-countermeasures&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The past 20 years or so has seen a long series of countermeasures
designed to prevent RCE, or at least make it harder, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Address_space_layout_randomization&amp;amp;oldid=1045013697&quot;&gt;Address Space Layout Randomization (ASLR)&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=W%5EX&amp;amp;oldid=1038078381&quot;&gt;Write XOR Execute (W^X)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Control-flow_integrity&amp;amp;oldid=1036491912&quot;&gt;Control Flow Integrity (CFI)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A number of processors have hardware support for anti-exploitation
mitigations, such as the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=NX_bit&amp;amp;oldid=1020970904&quot;&gt;NX bit&lt;/a&gt;, &lt;a href=&quot;https://www.software.intel.com/content/www/us/en/develop/articles/technical-look-control-flow-enforcement-technology.html&quot;&gt;Intel CET&lt;/a&gt;, or
&lt;a href=&quot;https://www.qualcomm.com/media/documents/files/whitepaper-pointer-authentication-on-armv8-3.pdf&quot;&gt;ARM PAC&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Generally, these techniques are not designed to actually prevent
memory issues such as buffer overflows (if you&#39;re going to work in
C this turns out to be quite difficult),
but rather to prevent them from being easily exploited.
Unfortunately, there has also been a long series of attack techniques
(e.g., &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Return-oriented_programming&amp;amp;oldid=1039271285&quot;&gt;return oriented programming&lt;/a&gt;)
developed to defeat these countermeasures, resulting in a never-ending
arms race of attack and defense which is good news for computer
security researchers but perhaps less good news for users. I don&#39;t
want to leave you with the impression that these techniques
don&#39;t do anything: they do make exploitation harder but at the
moment sophisticated attackers seem to usually be able to defeat them.
Some--though not all--of the problem is that the strongest techniques
have a very negative performance impact and developers have been
generally unwilling to accept that.&lt;/p&gt;
&lt;h3 id=&quot;process-separation-%2B-sandboxing&quot;&gt;Process Separation + Sandboxing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#process-separation-%2B-sandboxing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The industry standard approach for addressing this kind of memory
issue is to just accept that you will have insecure code and that it
will get compromised (including RCEs) and focus on limiting the damage
that the code can do. The general procedure is as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Take the most dangerous/vulnerable code and run it in its own
process (process separation)&lt;/li&gt;
&lt;li&gt;Lock down that process so that it has the minimum privileges
needed to do its job (sandboxing)&lt;/li&gt;
&lt;li&gt;If the process needs extra privileges have it talk to another
process which has more privileges but is (theoretically)
less vulnerable.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For instance, in a Web browser the most dangerous code is the stuff
that talks directly to servers, such as the HTML/JS renderer. In
modern browsers, the HTML/JS renderer runs in its own process that has
very limited capabilities (e.g., it cannot talk directly to the
network). This strategy was introduced in &lt;a href=&quot;http://www.peter.honeyman.org/u/provos/papers/privsep.pdf&quot;&gt;SSHD&lt;/a&gt;
and then &lt;a href=&quot;https://seclab.stanford.edu/websec/chromium/chromium-security-architecture.pdf&quot;&gt;adopted for browsers by Chrome&lt;/a&gt; but has more
or less been universally adopted in browsers
-- as well as a similar mechanism in iMessage called
&lt;a href=&quot;https://googleprojectzero.blogspot.com/2021/01/a-look-at-imessage-in-ios-14.html&quot;&gt;Blastdoor&lt;/a&gt; -- and while reasonably successful
is not a panacea. What it mostly means is that an attacker needs
to not only attack the vulnerable process and get an RCE but then
use that to attack the higher-privileged process or otherwise
get out of the sandbox (e.g., with an operating system vulnerability),
which still happens reasonably often.&lt;/p&gt;
&lt;p&gt;The more serious problem with this kind of approach is that it&#39;s
very expensive, both operationally (processes aren&#39;t free) and to
implement (disentangling all that code is hard). This means
that every time you want to sandbox some new piece of code it&#39;s
a lot of work and so even after years of this approach the major
browsers still only have relatively few different sandboxed
components.&lt;/p&gt;
&lt;h3 id=&quot;software-fault-isolation&quot;&gt;Software Fault Isolation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#software-fault-isolation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Over the past few years, Firefox has been working with
a &lt;a href=&quot;https://www.usenix.org/system/files/sec20-narayan.pdf&quot;&gt;new hardening strategy&lt;/a&gt;
developed by researchers at UCSD, the University of Texas,
and Stanford.
This system, called RLBox, is
on more sophisticaed software fault isolation
techniques and is designed to provide
a similar if not greater security level to operating system sandboxing
while being much lighter weight, both in terms of implementation
and operation.&lt;/p&gt;
&lt;p&gt;RLBox has two major pieces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A system that allows you to run a specific software component
in a lightweight sandbox.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Wrapper tools for checking the output of the components.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second of these is a bit out of scope for this post,
but the first is quite interesting. The general idea is
to compile the original code (written in C or whatever)
into &lt;a href=&quot;https://webassembly.org/&quot;&gt;WebAssembly&lt;/a&gt;
and then into machine code. This process doesn&#39;t prevent
&lt;em&gt;all&lt;/em&gt; memory issues but instead ensures that the
code can&#39;t read or write outside its own memory and
also that it can&#39;t jump to other parts of the program.
It &lt;a href=&quot;https://www.usenix.org/system/files/sec20-lehmann.pdf&quot;&gt;does not ensure&lt;/a&gt;
that attackers cannot change the execution path of the program,
though the attacks are not quite as good as with native
binaries, but because of the Web Assembly compilation
process their influence is confined to the sandboxed
component.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The nice thing about RLBox is that it&#39;s very easy
to convert existing code. Because RLBoxed code runs
in the same process as the code which uses it, it&#39;s
a relatively simple matter of wrapping the RLBoxed
function calls using the RLBox wrapping tools. Depending
on the size of the code -- really the number of functions
that you used the RLBoxed component -- this can take
a few hours or a few days, but is generally pretty
easy. Firefox already has a number of RLBoxed components
including the &lt;a href=&quot;https://scripts.sil.org/cms/scripts/page.php?site_id=projects&amp;amp;item_id=graphite_home&quot;&gt;Graphite&lt;/a&gt;
font library and the &lt;a href=&quot;http://hunspell.github.io/&quot;&gt;hunspell&lt;/a&gt; spelling
library with several more underway.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/memory-safety/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Memory safety issues are likely the most severe class of
software vulnerabilities. Unfortunately, they&#39;re also
extremely common and not going away any time soon.
We have a variety of techniques that can be used
to help mitigate their effect and each has their
place but none of them is sufficient alone.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yes I know these names are ludicrous &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Incidentally,
this is how you get around the problem I mentioned earlier
of stuff not being the same size. You put the string &amp;quot;Hello&amp;quot;
in the data segment and then just have the &lt;code&gt;Write&lt;/code&gt;
instruction use the memory address of wherever you put it. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You shouldn&#39;t feel too good about this because it&#39;s absolutely
the case that people have defined mechanisms to load and
run arbitrary code people sent them off the Internet,
but it&#39;s also not a good idea. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The way &lt;code&gt;gets()&lt;/code&gt; works is that it reads until someone
hits the return/enter key. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It probably actually has a function called &lt;code&gt;main()&lt;/code&gt;
on it, because C programs start with that function, but
we can ignore that. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: I&#39;m omitting the &lt;code&gt;&#92;0&lt;/code&gt; line ending
that &lt;code&gt;gets()&lt;/code&gt; uses because it just confuses things
right now.
 &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Mostly, that is. Unless you restrict the length values somehow. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is kind of an ill-defined term, but roughly it means
stuff that has to be low-level and relatively fast like
operating systems and Web browsers. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
One very notable exception is that Android apps are
generally written in Java or Kotlin, another language
that runs on the Java platform, even though much
of Android is still C/C++. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The original work on RLBox used Google&#39;s really
cool &lt;a href=&quot;https://developer.chrome.com/docs/native-client/&quot;&gt;Native Client (NaCl)&lt;/a&gt;
technology for safely running arbitrary binaries, but
was transitioned to WebAssembly because Google stopped
maintaining NaCl and Firefox already had extensive
WebAssembly support. &lt;a href=&quot;https://educatedguesswork.org/posts/memory-safety/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Tenaya Loop Adventure Run Report</title>
		<link href="https://educatedguesswork.org/posts/tenaya-loop/"/>
		<updated>2021-09-16T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/tenaya-loop/</id>
		<content type="html">&lt;p&gt;TL;DR. A great adventure run loop through Yosemite with
amazing views.&lt;/p&gt;
&lt;p&gt;My training partner Chris Wood and I were scheduled to run &lt;a href=&quot;http://www.tahoe200.com/tahoe-100k/&quot;&gt;Tahoe 100K&lt;/a&gt;
and &lt;a href=&quot;https://roguevalleyrunners.com/pages/pine-to-palm&quot;&gt;Pine to Palm 100 miles&lt;/a&gt; respectively last weekend, but both
races were canceled (thanks, forest fires!). Rather than revector
to last minute races, we decided to do an &amp;quot;adventure run&amp;quot; (runner jargon
for a long self-supported run) in Yosemite on a &lt;a href=&quot;https://pantilat.wordpress.com/2013/06/03/tenaya-rim-loop/&quot;&gt;route&lt;/a&gt;
pioneered by former ultrarunning and current FKT star &lt;a href=&quot;https://pantilat.wordpress.com/&quot;&gt;Leor Pantilat&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This was harder than we expected, and in particular the climb
out of Yosemite Valley is incredibly difficult. We decided to
skip the North Dome section because the trail was kind of faint
and we were worried that we didn&#39;t want to be out there on
an unfamiliar trail in the dark (remember, this isn&#39;t
marked ever 200 meters like an ultra), so we detoured out
to Tioga road and ran it on that. Still, we finished generally
feeling fine, so mission accomplished.&lt;/p&gt;
&lt;h2 id=&quot;logistics&quot;&gt;Logistics &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#logistics&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Yosemite has &lt;a href=&quot;https://www.nps.gov/yose/planyourvisit/reservations.htm&quot;&gt;restricted access&lt;/a&gt;:
you need a reservation even to come in for the day and can only be in the
park between 5 AM and 11 PM. Fortunately, passes are good for three days
and we were able to get one for Thursday September 9 which meant we
could use it for Saturday. It&#39;s actually a little unclear what kind of pass you need because
Yosemite is set up for either day hiking or overnight and the
overnight reservations depend on where you plan to camp,
which we weren&#39;t doing, so I ended up calling a ranger who
said that we just needed a day pass even if we were
there past 11 and that we should leave a note on our
car that we weren&#39;t staying.&lt;/p&gt;
&lt;p&gt;I realized on Thursday that my poles (
&lt;a href=&quot;https://www.blackdiamondequipment.com/en_US/product/distance-carbon-z-trekking-running-poles/&quot;&gt;Black Diamond Carbon Distance Z&lt;/a&gt;)
were broken when I took them out for an equipment check.
One segment of the pole retracts into the handle for storage and
there&#39;s a metal locking pin that pops out when you extend it to
keep it stable in use. The pin on one of my poles had rusted
shut and wouldn&#39;t pop out no matter how much we sprayed
WD-40 on it and tried to clean it off. Fortunately, the one
REI in the area that had a pair was in Dublin so we were able to
pick them up on the way. It sure would be nice if BD made this
piece out of stainless steel so it was less likely to rust.&lt;/p&gt;
&lt;p&gt;We drove out to Yosemite on Friday night and stayed at a hotel just
outside the park.  It&#39;s about 70-80 minutes from the hotel to the
trailhead but we&#39;d underestimated how close we were to the park and
ended up arriving at the entrance around 4:35. Out of an abundance of
rule following -- which we discovered later was unwarranted --
we waited till 5 AM to actually enter the park. This is obviously
the effect they are going for as you have to actually &amp;quot;self-certify&amp;quot;
your arrival at a given time, whereas with (say) the Grand Canyon
you can just drive in whenever. We got to
trailhead around 6
It takes a little while to prep everything at
the start (get your shoes on, use the bathroom, etc.) so we finally
got on the trail at around 6:50.&lt;/p&gt;
&lt;h2 id=&quot;start-to-nevada-falls-(0-12-miles%2C-%2B2192%2F-4121-ft%2C-3%3A18)&quot;&gt;Start to Nevada Falls (0-12 miles, +2192/-4121 ft, 3:18) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#start-to-nevada-falls-(0-12-miles%2C-%2B2192%2F-4121-ft%2C-3%3A18)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first long stretch is on the Clouds Rest trail out to
the John Muir Trail. This includes a climb to the highest
point of the day at around 9700 ft, but you start at around 8200 ft,
so it&#39;s not that big a deal. We actually got off course here
a bit and skipped Clouds Rest but didn&#39;t realize it at the
time (I just noticed writing this up).&lt;/p&gt;
&lt;p&gt;This is followed by a long descent to the
&lt;a href=&quot;http://www.johnmuirtrail.org/&quot;&gt;John Muir Trail (JMT)&lt;/a&gt; and down to
Nevada Falls. Once you pick up JMT, things start to get pretty
busy, especially once you get past the intersection to the
Half Dome Trail.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
Half Dome requires a special permit because it&#39;s so congested
and we didn&#39;t have one, and we probably didn&#39;t have time to do it
today.
Once we had passed the junction
we saw a bear amble across the trail, which is kind of unusual
this close to the Valley. The Yosemite bears won&#39;t really bother
you if you don&#39;t surprise them, so we just made some noise
and kept going.&lt;/p&gt;
&lt;h2 id=&quot;nevada-falls-to-the-valley-(12-24-miles%2C-%2B2388ft%2F4183ft%2C-3%3A31)&quot;&gt;Nevada Falls to the Valley (12-24 miles, +2388ft/4183ft, 3:31) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#nevada-falls-to-the-valley-(12-24-miles%2C-%2B2388ft%2F4183ft%2C-3%3A31)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point we made our first major navigational error:
JMT takes you straight from Nevada Falls into Yosemite Valley
but Pantilat&#39;s route takes you up the Panorama Cliff trail.
I had the route on my watch and it takes you a little way down
-- presumably to get a view of the Valley and Vernal Falls --
JMT so we got confused and went about a half mile (and down 200 ft!)
down before I realized we were off route. This required us to
backtrack uphill to get back to Panaroma.&lt;/p&gt;
&lt;p&gt;Panorama is a much more demanding route. There&#39;s a long climb
which is actually quite good footing and non-technical
which takes up you to Glacier
Point (also incredibly busy) and then down 4 mile trail to the Valley
itself. This is a giant descent (~4000ft) that&#39;s mostly runnable but
pretty rocky so you had to kind of jog it rather than push the
pace.  At this point things were starting to get warm and we just
barely had enough fluid to make it down the Valley.&lt;/p&gt;
&lt;p&gt;We got to the
Valley floor and crossed the Merced and thought about stopping and
refilling our bottles but figured there had to be some sort of running
water that wouldn&#39;t require filtering (see below for more on the
filtering thing).
As we crossed Northside Drive we found an information
booth and asked where we could get some water and the woman
staffing the booth pointed us at the Yosemite Lodge.
She looked pretty skeptical when we told her we were headed up towards Yosemite Point
(&amp;quot;It&#39;s very strenuous&amp;quot;) but as we were already 24 miles in at this
point we felt pretty confident.
In any case, we hiked over to the lodge
(the Valley itself is flat but it was so hot we ended up
walking it anyway)
and there was indeed a bathroom and a water tap but
there was a mask requirement but we only had one mask
so the whole process of filling our bottles took a long
time (maybe 20 minutes?).
This is partly just a matter
of it taking time to go to the lodge and then the cumbersome
filling process, but also once once of us had to sit and wait
the while thing just kind of became an extended aid station.
A good reminder not to sit down if you want to make good time.&lt;/p&gt;
&lt;h2 id=&quot;the-valley-to-yosemite-point-(24-33.5%2C-%2B4564%2F-1181-ft%2C-4%3A46)&quot;&gt;The Valley to Yosemite Point (24-33.5, +4564/-1181 ft, 4:46) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#the-valley-to-yosemite-point-(24-33.5%2C-%2B4564%2F-1181-ft%2C-4%3A46)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We then headed to Camp 4 for the start of the climb and realized we&#39;d
made a mistake going to the lodge because Camp 4 has bathrooms with
running water and we could have saved a lot of time.  This is of
course partly a communication problem with the information booth but
also my bad for not doing more research about where the water was. I
had mostly been focused on where there were streams but just sort of
assumed it would be easy to find water in the Valley.&lt;/p&gt;
&lt;p&gt;The climb up to Yosemite Point was indeed difficult. There are two
main climbs, one that&#39;s 1.3 miles and 1125 ft and another that&#39;s
1.2 miles and 2041 ft (followed by a bonus easy 1.3 miles and 453 ft).
The two main ones are incredibly rocky, but at least mercifully
shaded. At the end of the first one there&#39;s a brief downhill
where we crossed paths with some hikers who had just done El Capitan
and were worried they were on the wrong route. We told them they
were and asked about the rest and they said something to the effect
of &amp;quot;the next climb is horrendous&amp;quot; (true words!). Even with poles
this was all a tremendous slog, really long and steep and mostly
over rocky steps and we were certainly glad to
be at the top.&lt;/p&gt;
&lt;p&gt;Garmin&#39;s &amp;quot;ClimbPro&amp;quot; feature was really helpful here as it shows
how long the climb you are on is and so gives you a sense of
how you are doing. The GPS itself did go kind of haywire
on the second climb and it kept telling us we had .47 to go
for maybe 10 minutes, but eventually it worked itself out.&lt;/p&gt;
&lt;p&gt;Once we reached the summit we started getting a little concerned:
it was getting kind of late, we were low on fluid, and we were
already about 10 hrs in with 11 miles of reasonably hard work
to go. Fortunately, we soon got to Yosemite Falls and even
though there wasn&#39;t much in the way of falls there was some
semi-stagnant water below the bridge and we were able to
fill our bottles. However, as we pushed on to Yosemite Point
the trail started to really fade out and we got off trail
several times. Sunset is around 7:00 in Yosemite this time of
year and we were really unenthusiastic about trying to
find out way in sketchy trail we didn&#39;t know purely by
headlamp, so we decided to cut off the loop to North Dome
and head straight to the road via the Porcupine Creek
trail.&lt;/p&gt;
&lt;h2 id=&quot;yosemite-point-to-finish-(33.5-41.5%2C-%2B1024%2F-663-ft%2C-2%3A04)&quot;&gt;Yosemite Point to Finish (33.5-41.5, +1024/-663 ft, 2:04) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#yosemite-point-to-finish-(33.5-41.5%2C-%2B1024%2F-663-ft%2C-2%3A04)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The good news is that the trail to the road (3.1 miles) is quite clear
and we were able to use Gaia GPS to figure out whether we
were on track. We made pretty good time in this section and
ran some of the flat/downhill sections.&lt;/p&gt;
&lt;p&gt;Early on in this segment we ran into a woman who was
doing a virtual Tahoe 200 (the actual race was cancelled
because of the fires) and was heading in for a segment.
We asked her what she was doing about the permits because
she was going to be out overnight and also had crew and she
said she&#39;d just called the rangers and explained the situation
and they had said not to worry; that&#39;s what we should have done
rather than being all nitpicky about not starting before 5.
She asked about water and we told her Yosemite Falls was good
and then we kept going.&lt;/p&gt;
&lt;p&gt;By the time we hit the road we were definitely a bit tired
so we sat for a few minutes to eat and lighten our bottles.
Now that it had gotten cool we were both carrying way too much fluid,
so we went down to about a liter each for the final bit.
I was also starting to get a bit nauseated at this point
and while I never vomited I wasn&#39;t really that enthusiastic
about more Tailwind or water. It took about 90 minutes of
driving before I stopped feeling nauseated.&lt;/p&gt;
&lt;p&gt;The last 5 miles or so were on the road. Initially we weren&#39;t
sure how long it was because Gaia GPS wants to route straight
but then we realized our Garmins would route us. It was also
at this point that we realized we had another 700 feet of climbing
between us and the finish followed by a mile and a half descent.
To be honest, this part was pretty bad: we were both quite
tired and it was starting to get dark. The road is narrow and
even with bright headlamps so cars can see you it&#39;s pretty
nervewracking to see them coming right at you and not be
sure if they are going to swerve. The climb itself wasn&#39;t that
bad, but at this point in the event my feet and legs start
to hurt and so running the downhill is actually an exercise in
forcing yourself to push through (good practice, though).
I was drinking a little bit but figured it didn&#39;t matter
too much because I could make it all the way without
much at all.&lt;/p&gt;
&lt;p&gt;Eventually we could see the signs for the trailhead and sort
of arbitrarily picked the point on the road where you go into
the parking lot, stopped, and walked the remaining 100 yards
or so to the car.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Overall this went well. We finished in good order without
either of us really cratering or it turning into a horrible
death march.&lt;/p&gt;
&lt;p&gt;I think we did pretty well on pace. We might have been
able to push the early climbs a bit harder (the later
ones were just a matter of survival) and I think we could
have run a few of the flatter climbs, but overall we
finished pretty tired. A lot of the descents were really
slow because we were worried about crashing, in part
because I had had a really bad fall about 3 weeks before
and was worried about another one before I was completely
recovered.&lt;/p&gt;
&lt;p&gt;Nutrition went reasonably well for most of this: I brought
Tailwind and Powerbars and aimed for 500ml Tailwind
and 1/2 Powerbar every hour (~300 cal), which I mostly
did by just figuring I was going about 15 min/mile.
As noted above, the Tailwind started to become a bit of
a problem towards the end but I was still comfortable
with Powerbars. In retrospect, I wish that I had brought
some salty snacks for the last half: I&#39;m used to them just
being available at races, but of course here we had to
carry our own stuff.&lt;/p&gt;
&lt;p&gt;Our planning/logistics could have been better. If we&#39;d
gotten to the park at 3:30 or so and started at 5, we
would have had a lot more daylight and would have been
more comfortable with doing the whole loop. Obviously
we were tired, but the clinching reason for me was worrying
about getting lost or just finishing super-late. If I&#39;d
called the rangers and cleared this, then I would have
been a lot more comfortable, but I got kind of worried
about the threat of huge fines and so that kept us
back.&lt;/p&gt;
&lt;p&gt;I also wish I&#39;d had a better sense of the route. We got off course a
few times before I decided to set the &amp;quot;off course&amp;quot; alarm (I was
worried about battery consumption) which cost time, and if I&#39;d
known the route better, that wouldn&#39;t have happened. Instead
I was just relying on the GPS, which was a mistake, especially
as it included some of Pantilat&#39;s detours to take photos, etc.
Second, this meant I didn&#39;t really know where there was water
and the like, which cost us time in the Valley but also just
meant I was nervous a lot of the time about whether we would
have enough (in the event, this was not an issue).
We were both using the &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/xa-filter-cap-42.html#color=45979&quot;&gt;Salomon XA filter cap&lt;/a&gt;
on our bottles which works great. When you get to a water
source you can just quickly fill the bottle and drink and
then refill it, so you&#39;re already a liter up, plus it&#39;s
relatively easy to squeeze it into a different bottle
if you want to have more than that. Alternately, you can
just fill another bottle with unfiltered water and remember
that it&#39;s now contaminated. We were each carrying
5 bottles (2.5 liters) and we never needed more capacity
than that. It&#39;s a little bit of a pain to fill with Tailwind
in these case, but never that big a deal.&lt;/p&gt;
&lt;p&gt;I wore the &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/sense-4-pro.html#color=48784&quot;&gt;Salomon Sense Pro 4&lt;/a&gt;
for this and they worked out reasonably well, though my
ankle started to hurt a bit towards the very end. I
might have been better with my &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/s-lab-ultra-3.html#color=37168&quot;&gt;S/LAB Ultra 3&lt;/a&gt;
which are a bit more built up though slightly (~35g)
heavier and are a little more supportive on this kind
of tricky terrain (also, the lace garage is at
the bottom so the laces never come out unlike the Sense Pros).&lt;/p&gt;
&lt;p&gt;As noted above, I&#39;d had a really bad
fall coming down Kennedy Road a few weeks before and it had
left one of the ribs on my left side incredibly sore. I&#39;d
mostly trained through it and it had gotten a lot better by
this time, but it was still sore and I was worried it would
be a problem, especially with having to use my upper body
for the poles. It was actually mostly fine, though.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Overall time&lt;/strong&gt;: 41.4 mi, 10164ft, 13:39:49&lt;/p&gt;
&lt;h2 id=&quot;pictures&quot;&gt;Pictures &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#pictures&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Here are the best of the picture Chris took during the run.
I took a few as well, but they&#39;re mostly duplicative, so
I&#39;m just using his.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/IMG_1415.jpg&quot; alt=&quot;&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/IMG_1421.jpg&quot; alt=&quot;&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/IMG_1425.jpg&quot; alt=&quot;&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/IMG_1432.jpg&quot; alt=&quot;&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/IMG_1433.jpg&quot; alt=&quot;&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/IMG_1435.jpg&quot; alt=&quot;&quot; /&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/IMG_1443.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;ve done JMT but never &lt;a href=&quot;https://www.nps.gov/yose/planyourvisit/halfdome.htm&quot;&gt;Half Dome&lt;/a&gt;.
Although there are plenty of real climbing routes on Half Dome,
there&#39;s an ascent that has cables to let you get up and that can
get super crowded. &lt;a href=&quot;https://educatedguesswork.org/posts/tenaya-loop/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>What&#39;s an ultramarathon?</title>
		<link href="https://educatedguesswork.org/posts/whats-an-ultra/"/>
		<updated>2021-09-12T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/whats-an-ultra/</id>
		<content type="html">&lt;p&gt;If you tell someone you run ultramarathons, it&#39;s pretty common
for the next question to be &amp;quot;what&#39;s an ultramarathon&amp;quot;?
This is a question with both a simple and a complicated answer.
The simple answer is that an ultra is a race that&#39;s longer
than a marathon, so technically I guess if you run a marathon
and then run to your car, you&#39;ve done an ultra marathon.
The complicated answer is that there are a lot of different
kinds of ultras and they vary on a number of axes.&lt;/p&gt;
&lt;h2 id=&quot;distance&quot;&gt;Distance &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#distance&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The defining characteristic of an ultra is just distance.
The common ultra distances (from shortest to longest) are
50 km (31 miles), 50 miles (80 km), 100 km (62 miles),
and 100 miles (160 km). You&#39;ll notice that these are
all &amp;quot;natural&amp;quot; distances in one system or the other.
Also, because many ultras are run on trails, it&#39;s often hard
to measure the distance precisely so it&#39;s not uncommon to
have some distance which is sort of approximately like
one of these common distances (e.g., 85 km or 105 km)
and even when the advertised distance is one of these
common values, it&#39;s not too uncommon to see the distance
actually be a bit off. Sometimes this is acknowledged
(e.g., the &lt;a href=&quot;http://www.mogollonmonster100.com/&quot;&gt;Mogollon Monster&lt;/a&gt;
calls itself a 100 miler but then says that it&#39;s more like 102 or 103)
and sometimes the distances are just wrong.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There are also much longer events, starting at 135 miles
or so for the &lt;a href=&quot;https://www.badwater.com/event/badwater-135/&quot;&gt;Badwater 135&lt;/a&gt;.
200 and 250 mile trail events are reasonably common in the
US including &lt;a href=&quot;https://www.destinationtrailrun.com/&quot;&gt;Destination Trail&lt;/a&gt;
(Bigfoot 200, Tahoe 200 and Moab 240) and Aravaipa (&lt;a href=&quot;https://cocodona.com/&quot;&gt;Cocodona 250&lt;/a&gt;).
Even higher up we have stuff like the &lt;a href=&quot;https://www.megarace.de/&quot;&gt;Megarace&lt;/a&gt; which
is 3100km and the 3100 mile Srin Chinmoy &lt;a href=&quot;https://3100.srichinmoyraces.org/&quot;&gt;transcendence race&lt;/a&gt;, which takes 52 days.&lt;/p&gt;
&lt;h2 id=&quot;terrain&quot;&gt;Terrain &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#terrain&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Like other running, ultras take place on three major types
of terrain: track, road, and trail.&lt;/p&gt;
&lt;p&gt;Track isn&#39;t that common and is used mostly for record attempts of one
kind or another (e.g., the 12 hr world record) because it&#39;s very
controlled and flat.&lt;/p&gt;
&lt;p&gt;Road should be pretty self-explanatory: you run on the road just
like with 10Ks, marathons, etc. One difference here is that
the roads often aren&#39;t closed: ultras are a lot smaller than
shorter road races and take longer so it&#39;s a bigger deal to
close them. One unusual road ultra is the &lt;a href=&quot;https://www.thesfmarathon.com/the-races/ultramarathon/&quot;&gt;SF Ultramarathon&lt;/a&gt;
in which you run the SF marathon course (with some variations)
backwards and then run the SF Marathon after.&lt;/p&gt;
&lt;p&gt;In the US, at least, most ultras are on trail (personally, I
won&#39;t road race without some extenuating circumstances, too
boring and too hard on the legs even at slow paces).
The two big variables are surface and climbing. First, the actual trail surface
can vary from from gravel trail (e.g., &lt;a href=&quot;http://umstead100.org/course.html&quot;&gt;Umstead 100&lt;/a&gt;)
to incredibly rocky (e.g.,&lt;a href=&quot;https://zanegrey50.com/&quot;&gt;Zane Grey&lt;/a&gt;)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Sometimes you&#39;ll get both in the same race, as with &lt;a href=&quot;https://www.jfk50mile.org/&quot;&gt;JFK 50&lt;/a&gt;
which has about 15 miles on the technical Appalachian trail
(&amp;quot;technical&amp;quot; is runner jargon for
lots of rocks and/or roots) followed by 35
miles on the smooth C&amp;amp;O canal towpath.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The other big variable is how much climbing (jargon: &amp;quot;vert&amp;quot;) there is.
There&#39;s an incredibly broad range here, from nearly flat (&amp;lt;60 feet
per mile at &lt;a href=&quot;http://elevatemyrace.com/durbin_tunnel_hill_100_miler/&quot;&gt;Tunnel Hill&lt;/a&gt;)
to ridiculously hilly (&amp;gt;300 ft/mile at &lt;a href=&quot;https://utmbmontblanc.com/en/&quot;&gt;Ultra Trail de Mont Blanc&lt;/a&gt; or &lt;a href=&quot;https://www.hardrock100.com/&quot;&gt;Hard Rock 100&lt;/a&gt;) and then
to truly ridiculous (&amp;gt;500 ft/mile at &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Barkley_Marathons&amp;amp;oldid=1040402808&quot;&gt;Barkley&lt;/a&gt;). Roughly speaking, &amp;gt;150 ft/mile
is considered a lot of climbing and &amp;gt;200 ft/mile would be a very
hilly event.&lt;/p&gt;
&lt;p&gt;As a rule of thumb, West Coast US races tend to have fairly
non-technical trail (though sometimes rocky)
with a lot of vert, mostly in sustained
climbs. East Coast US races tend to have flatter races with
more technical trails with a lot of rocks and roots. When
there are climbs they tend to be shorter.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.
A number of Western events are also at altitude, especially
in the Rockies. European events often take place in mountainous
regions with a lot of vert and tricky trail, which seems to
cause some Americans trouble (no American man has ever placed
&lt;a href=&quot;https://www.irunfar.com/the-mystery-of-american-men-at-utmb&quot;&gt;higher than third at UTMB&lt;/a&gt;, though American women have done quite
well, with the phenomenal Courtney Dauwalter having won twice, most
recently breaking the course record.)&lt;/p&gt;
&lt;h2 id=&quot;format&quot;&gt;Format &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#format&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Like most road races, most ultras are run in a fixed distance
format with the winner being whoever finishes first. You can
of course stop at aid stations or in longer ultras, lie down
and sleep, but the clock is running the whole time, so when
you&#39;re not moving you&#39;re falling behind (good advice is to
keep moving as much as you can, because even if you&#39;re walking
you&#39;re doing better than standing still).&lt;/p&gt;
&lt;p&gt;Some races are fixed time (e.g., 24 hours) instead of fixed distance
with the winner being whoever goes the furthest in a given time.
This kind of race is usually run on some kind of shortish
course, like a track or short loop of a mile or so; that makes
it easy to keep track of where people are and also makes it
easy to just stop whenever time expires. Another advantage of this
format is that you can have a single set of fixed aid stations
so runners can have food available to them more or less whenever
the want because they&#39;re passing it every 2-15 minutes depending
on the course and their speed.&lt;/p&gt;
&lt;p&gt;Less frequent are stage-style races in which every day there&#39;s
a fixed course that you have to run, but then you stop and
take the night off. The winner is then determined by combining
the stage results, either by minimum time or by allocating
points for each stage. An example of this is &lt;a href=&quot;https://www.marathondessables.com/en&quot;&gt;Marathon des Sables (MdS)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Recently, a new style of &amp;quot;backyard ultra&amp;quot; has taken off, inspired
by &lt;a href=&quot;http://bigsbackyardultra.com/&quot;&gt;Big Dog&#39;s Backyard Ultra&lt;/a&gt;. This is
a somewhat unusual format where there is a 4.16 mile loop
that the contestants have to finish every hour. You start every
loop together and keep going until only one person is left.
Because you have to start every hour, it&#39;s not possible to get
much rest even if you go fairly fast. At this point, the winners
are doing 68+ hours (280+ miles). Not for me, I like to race and
then sleep in my own bed, not stay up for 3 days straight.&lt;/p&gt;
&lt;h2 id=&quot;ultra-adjacent-stuff-(fkts%2C-mountain-running)&quot;&gt;Ultra-Adjacent Stuff (FKTs, Mountain Running) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#ultra-adjacent-stuff-(fkts%2C-mountain-running)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There&#39;s a fair amount of overlap between ultra and shorter than ultra
mountain races (lots of climbing and on mountains)
such as Pike&#39;s Peak Marathon, Sierre-Zinal, Marathon de Mont-Blanc, etc.
For instance, ultra legend Killian Jornet has won UTMB, Hardrock, and Western
States but has also won Sierre-Zinal (31 km with &amp;gt; 2000 meters of climbing) an
unbelievable 9 times.
This is a bit more of a European thing than an American one, though more
Americans seem to be going to Europe to race now.&lt;/p&gt;
&lt;p&gt;Another ultra-adjacent race type activity is putting up &amp;quot;fastest known
times&amp;quot; (i.e., records) for specific trails. There are hundreds of
trails with FKTs (&lt;a href=&quot;https://fastestknowntime.com/&quot;&gt;fastestknowntime.com&lt;/a&gt;
is the go-to site) on routes big and small, but many of the
famous ones are now held by ultrarunners, including
the Pacific
Crest Trail (Tim Olson), Appalachian Trail
(Karl Meltzer), John Muir Trail (Francois D&#39;haene for South to North supported), and
Rim-to-Rim-to-Rim (Jim Walmsley)
There was a lot of this in 2020 because so many races were canceled because of COVID-19,
including Corinne Malcolm on the Tahoe Rim Trail and and Tim Olson on the PCT,
as well as Scott Jurek&#39;s partial attempt on the Appalachian Trail covered
in the &lt;a href=&quot;https://www.nytimes.com/2021/09/05/sports/scott-jurek-ultramarathon.html&quot;&gt;NYT&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;faq&quot;&gt;FAQ &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#faq&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Really long distance running is kind of a foreign idea for most people
and so naturally they have questions. Below I try to answer some of the
most common ones.&lt;/p&gt;
&lt;h3 id=&quot;are-you-running-the-whole-time%3F&quot;&gt;Are you running the whole time? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#are-you-running-the-whole-time%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You&#39;re mostly moving the whole time. People will often hike the uphills
(see below) and stop at aid stations to grab food, refill their bottles,
change their shoes, etc. but mostly you want to keep moving.&lt;/p&gt;
&lt;h3 id=&quot;do-you-sleep%3F&quot;&gt;Do you sleep? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#do-you-sleep%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Generally, on anything less than a 100 miler you wouldn&#39;t sleep at
all. Typically, the time limit for a 100K will be around 16-18 hrs, so
it&#39;s just a super-long day. 100 mile time limits are usually more like
30-48 hrs depending on the race difficulty. You still probably wouldn&#39;t
sleep at all or maybe for a few minutes. For longer races you have
to sleep some, but people typically try not to sleep for very long
because when you&#39;re sleeping you&#39;re not moving.&lt;/p&gt;
&lt;h3 id=&quot;what-do-you-eat%3F&quot;&gt;What do you eat? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#what-do-you-eat%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;For shorter races, people typically eat the same stuff you&#39;d eat
in a marathon: energy bars, gels, sports drinks, etc. For longer
races, people often want real food of some kind or another whether
it&#39;s snacks like (cookies, chips, pretzels, etc.) or even
something more substantial like quesadillas, pizza, etc. Typically
as it gets dark and cold, aid stations will serve soup or broth,
as well as coffee.&lt;/p&gt;
&lt;h3 id=&quot;do-you-have-to-carry-all-your-food%3F&quot;&gt;Do you have to carry all your food? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#do-you-have-to-carry-all-your-food%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Most races will have &amp;quot;aid stations&amp;quot; along the way. These are usually
tents with food, water, and some electrolyte drink (high-tech Gatorade,
effectively). The aid stations may be anywhere from every 5 miles to
every 15-20 miles apart. Even on a race with frequent aid stations,
many runners are moving quite slowly (14 minutes a mile is a very
respectable time for a 100 miler), so there can be quite a bit of
time between aid stations and so most people will at least carry
some kind of fluid with them especially on longer races where
you might be running through the hottest part of the day.
Also, if there&#39;s some food you particularly like you might carry this.
If I don&#39;t like the electrolyte drink they are serving I might bring
my own, though it can be a pain to mix at the aid station.&lt;/p&gt;
&lt;p&gt;A lot of races will also have &amp;quot;drop bags&amp;quot; which you can give to
the organizers at the start and they will take to the aid station
for you. These can contain food, clothes, a headlamp, whatever
(you don&#39;t really want to carry your headlamp all day, right?.
Some races also allow you to have a &amp;quot;crew&amp;quot; which is to say
people who meet you at the aid station to assist you, bring
you food, etc.&lt;/p&gt;
&lt;h2 id=&quot;seems-like-you&#39;re-not-going-very-fast.&quot;&gt;Seems like you&#39;re not going very fast. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#seems-like-you&#39;re-not-going-very-fast.&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;That&#39;s right. Speed drops off pretty fast the longer you
go and the more climbing there is, the slower the race will
be overall. In fact, most people will &amp;quot;power hike&amp;quot; any
significant climb: running uphill is very tiring and
isn&#39;t that much faster than hiking. In addition, if the trail is technical that slows
you down as well, as does running in the dark. Finally, if the
race is at altitude, that will also slow you down.&lt;/p&gt;
&lt;h2 id=&quot;why-would-you-do-this%3F&quot;&gt;Why would you do this? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#why-would-you-do-this%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I am unable to provide a satisfactory answer to this question.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;ve seen a number of American ultras which are 100.2,
presumably in imitation of the famous &lt;a href=&quot;https://www.wser.org/&quot;&gt;Western States&lt;/a&gt;
course. &lt;a href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I did this back when it was a 50 miler and they are not lying when they say
it is rocky. &lt;a href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;My coach, Emily (Harrison) Torrence, has won JFK
race twice. &lt;a href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This mirrors
the terrain differences between the Pacific Crest Trail
and the Appalachian trail &lt;a href=&quot;https://educatedguesswork.org/posts/whats-an-ultra/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Do you know what your computer is running?</title>
		<link href="https://educatedguesswork.org/posts/verifying-software/"/>
		<updated>2021-09-07T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/verifying-software/</id>
		<content type="html">&lt;script src=&quot;https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js&quot;&gt;&lt;/script&gt;
&lt;script&gt;
            mermaid.initialize({ startOnLoad: true,
                sequence: {
                    mirrorActors: false}});
&lt;/script&gt;
&lt;p&gt;A relatively common problem in computing is to determine what software
is running on some device.  As I mentioned in a &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/&quot;&gt;previous
post&lt;/a&gt;, this
turns out to be a much harder problem than you would intuitively think
it is, as we&#39;ll see below.&lt;/p&gt;
&lt;h2 id=&quot;drm-and-attestation&quot;&gt;DRM and Attestation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/verifying-software/#drm-and-attestation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Let&#39;s ease into the problem by starting with what&#39;s probably the best
known application for verifying what piece of software is running,
namely &lt;em&gt;Digital Rights Management&lt;/em&gt; (DRM), which is the industry jargon for
copy protection for music, movies, etc. Suppose that a video streaming
service wants to
let you watch a movie but prevent you from sending a copy to someone
else, saving it&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;, etc. The first thing they are going to do is encrypt
it, but at some point it has to get decrypted by a computer on your
end and displayed on a screen. The problem from their perspective is that
it&#39;s your computer, not theirs, so what stops you from loading
new software onto your computer that records the movie on disk somewhere?
In order to make this work, the service needs to somehow verify what software
your computer is running.&lt;/p&gt;
&lt;p&gt;One thing you might think that the service could do is just ask your
computer to tell it what software it&#39;s running, like so:&lt;/p&gt;
 &lt;div class=&quot;mermaid&quot;&gt;
sequenceDiagram
    Service -&gt;&gt; Player: What software are you running?
    Client -&gt;&gt; Player: Player version 1.0.
    Service -&gt;&gt; Player: Media
&lt;/div&gt;
&lt;p&gt;The obvious problem here is that whatever viewing software you
have installed on your computer can just lie about its
version number; how would the service know better? Another thing that
people often suggest is that the client send a hash of the
player software, but this has the same problem; your computer
can just lie about it.
At one level, this is just the same problem that you have authenticating
any endpoint over the Internet, namely that the only information
you have is what the person on the other end sends you, and they
could be lying.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://upload.wikimedia.org/wikipedia/en/f/f8/Internet_dog.jpg&quot; alt=&quot;On the Internet nobody knows you&#39;re a dog&quot; /&gt;&lt;/p&gt;
&lt;p&gt;All of the standard solutions to Internet authentication involve
the endpoint proving its identity (the &lt;em&gt;authenticating party&lt;/em&gt; (AP))
demonstrating knowledge of
some secret information (password, cryptographic key, etc.)
to the endpoint who wants to authenticate them (i.e., the &lt;em&gt;relying party&lt;/em&gt; (RP)).
In some cases the RP and the AP will share the information and in others
the RP will just have something (a &amp;quot;verifier&amp;quot;) that lets them verify that the
AP&#39;s message is correct. In either case, the AP has to have a secret value
and has to be able to keep it secret. This isn&#39;t unreasonable in the ordinary
authentication context because, as described in &lt;a href=&quot;https://www.rfc-editor.org/rfc/rfc3552.html#section-3&quot;&gt;RFC 3552&lt;/a&gt;, we normally assume
the endpoint is secure:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Internet environment has a fairly well understood threat model.
In general, we assume that the end-systems engaging in a protocol
exchange have not themselves been compromised.  Protecting against an
attack when one of the end-systems has been compromised is
extraordinarily difficult.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That&#39;s all fine when we&#39;re talking about a web site authenticating them
to you or you to the site, but the problem in the DRM context is
that the computer on which the player runs &lt;strong&gt;belongs to the attacker&lt;/strong&gt;
which is to say &lt;strong&gt;you&lt;/strong&gt;. Remember that the purpose of DRM is to
stop you from doing what you want with the the media, whether
that&#39;s saving a copy, forwarding it to a friend, screenshotting
it, or even skipping past the annoying FBI warning at the start.
So, pretty much by definition the end-system on which the player
is running is compromised, which makes it hard for it to keep
a secret. The attacker can reverse engineer the program
to extract the key, as famously &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=DeCSS&amp;amp;oldid=1034946734&quot;&gt;happened&lt;/a&gt;
with DVD copy protection.
&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There are two basic approaches that people who make DRM systems
use to address this problem. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Obfuscation_(software)&amp;amp;oldid=1040223923&quot;&gt;Obfuscation&lt;/a&gt;
and &amp;quot;trusted computing&amp;quot;.&lt;/p&gt;
&lt;h3 id=&quot;obfuscation&quot;&gt;Obfuscation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/verifying-software/#obfuscation&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&amp;quot;Obfuscation&amp;quot; is the blanket term for a variety of software
engnineering techniques designed to prevent someone in possession of
your program from figuring out what it does (in this case, from
extracting whatever secret it&#39;s using to authenticate).
In general, it&#39;s reasonably straightforward -- though often
a lot of work -- to figure out what a given program does
The usual situation is that you have a program &lt;em&gt;binary&lt;/em&gt;
(i.e., something the computer can run)
which has been &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Compiler&amp;amp;oldid=1038878325&quot;&gt;compiled&lt;/a&gt;
from &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Source_code&amp;amp;oldid=1031128092&quot;&gt;source code&lt;/a&gt; written
in some nominally human-readable -- or at least writable by humans -- language like C, Java, etc. This
is a lossy transformation in that it may remove comments, the names of functions,
variables, etc. There are tools such as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Decompiler&amp;amp;oldid=1033288449&quot;&gt;decompilers&lt;/a&gt;
that allow you to go from the binary back to compilable code,
but often the result isn&#39;t ideal (e.g., you get function names like &lt;code&gt;func123a&lt;/code&gt;).
Even when you do have the source code, it can be quite difficult
to figure out what a large system does, just because programs can
be very complicated; this is one reason it takes a while for
even experienced programmers to be effective when moving to a new
job and a new code base.&lt;/p&gt;
&lt;p&gt;However, it&#39;s possible to make this process a lot harder by transforming
the program appropriately. A detailed description of this process
is outside the scope of this post, but for instance, you might
break up and separate logical units such as functions, merge unrelated
units into the same function, conceal constants, etc. You can also
automatically generate code which is executed at runtime. There are of
course tools for doing this kind of thing and the result
of all this can be very difficult to read.  And of course, there
are tools to assist in removing obfuscation.&lt;/p&gt;
&lt;p&gt;The big challenge for obfuscation is that the analyst/attacker can just
&lt;em&gt;execute&lt;/em&gt; your program and see what it does. Moreover, they can
execute it under instrumentation such as a debugger or a virtual
machine and trace how
individual values are computed. This is a real challenge to keeping
secrets because the secrets have to eventually be used to do something
or other and so the analyst can work backward from the data that
gets written to the network to how those values were computed. For this
reason, it&#39;s not uncommon for obfuscated programs to also have some
mechanism to detect when they are being analyzed in this fashion and
to behave differently (e.g., to abort).&lt;/p&gt;
&lt;p&gt;At the end of the day, however, obfuscation is an arms race: unlike
ordinary cryptographic protections which are designed to provide security
under certain mathematical assumptions, obfuscation is just about
making analysis really annoying. With enough work, a determined attacker will nearly
always be able to deobfuscate a given piece of software.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
This has motivated
interest in techniques which are intended to be harder to attack,
namely what&#39;s called &amp;quot;trusted computing&amp;quot;.&lt;/p&gt;
&lt;h3 id=&quot;trusted-computing&quot;&gt;Trusted Computing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/verifying-software/#trusted-computing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Conceptually, what trusted computing does is to replace obfuscation
with hardware. The general idea is to add a new chip
(often called a &lt;em&gt;trusted platform module&lt;/em&gt; (TPM)) to your computer
that you don&#39;t get to run code on. That chip has a secret embedded
in it that lets it authenticate itself so you can&#39;t just impersonate
it with code you write yourself.
The usual practice is to have each chip have its own secret&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;,
which makes it
possible to blocklist a given chip if you determine that the secret
has been compromised (for instance, if someone releases a software
player with that secret in it).
Sometimes the chip will have some sort of &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Tamperproofing&amp;amp;oldid=1035760653#Chips&quot;&gt;technology&lt;/a&gt;
to prevent someone from breaking into it and stealing the secret. For instance,
it might detect when the case is removed and erase (the technical term here
is &amp;quot;zeroize&amp;quot;) the embedded secret, but even if you don&#39;t do that, it&#39;s
supposed to require physical attack to extract the key,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
and this is
difficult for people to do at home and may also destroy your
device. By contrast, for software-based
obfuscation, it&#39;s much easier to just write some program that you
can run to extract the keys from everyone&#39;s player, even if they
have different keys.&lt;/p&gt;
&lt;p&gt;The challenge with TPMs is that they tend to be fairly limited.
You already have a fast processor on your device and you
don&#39;t want to put a second fast processor in the TPM, which means
that it&#39;s going to be a challenge to do expensive compute tasks
like video decoding in the TPM. Moreover, once you&#39;ve decoded
the media it&#39;s got to go somewhere, and that somewhere is
usually the video card or whatever, which is connected to the
main processor. The usual solution is to do the media decoding
on the CPU but use the TPM for &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Trusted_Computing&amp;amp;oldid=1032588627#Remote_attestation&quot;&gt;&lt;em&gt;attestation&lt;/em&gt;&lt;/a&gt;.
What this means is that the TPM is able to look at the computer&#39;s
memory and determine what program is running and then tell
the other side.&lt;/p&gt;
 &lt;div class=&quot;mermaid&quot;&gt;
sequenceDiagram
    Service -&gt;&gt; Player: What software are you running?
    Player -&gt;&gt; TPM: Please attest.
    note over TPM: Checks memory    
    TPM -&gt; Player: Program hash is XXX, signed TPM
    Player -&gt;&gt; Service: Program hash is XXX, signed TPM
    Service -&gt;&gt; Player: Media
&lt;/div&gt;
&lt;p&gt;It&#39;s important to remember that this still requires a secure
channel (i.e., encryption) between the Service and the Player.
Otherwise, the attacker will just mount a man-in-the-middle
attack, like so:&lt;/p&gt;
 &lt;div class=&quot;mermaid&quot;&gt;
sequenceDiagram
    Service -&gt;&gt; Attacker: What software are you running?
    Attacker-&gt;&gt; Player: What software are you running?
    Player -&gt;&gt; TPM: Please attest.
    note over TPM: Checks memory    
    TPM -&gt; Player: Program hash is XXX, signed TPM
    Player -&gt;&gt; Attacker: Program hash is XXX, signed TPM
    Attacker -&gt;&gt; Service: Program hash is XXX, signed TPM
    Service -&gt;&gt; Attacker: Media
&lt;/div&gt;
&lt;p&gt;A secure channel prevents this because the service knows
that they are talking to the player (even if the attacker
is in the middle).&lt;/p&gt;
&lt;h2 id=&quot;verifying-devices&quot;&gt;Verifying Devices &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/verifying-software/#verifying-devices&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Now that we&#39;ve covered DRM, we can finally talk about verifying
the software on devices.&lt;/p&gt;
&lt;h3 id=&quot;why-this-is-really-hard&quot;&gt;Why this is really hard &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/verifying-software/#why-this-is-really-hard&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The good news is that you can use exactly the same techniques
to verify the device in front of you that you can to verify
a device over the Internet. The bad news is that it doesn&#39;t
help very much. The reason for this is that &lt;em&gt;you&lt;/em&gt; aren&#39;t
interacting with the device over a cryptographically secure channel;
instead you&#39;re just pushing buttons, swiping on the touch screen,
etc. But how do you know that you are actually talking to
the real device? For instance, suppose that the attacker takes an
iPhone, jailbreaks&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
it, and then installs some software that
&lt;em&gt;remotes&lt;/em&gt; a real iPhone, forwarding all of your touches to
that iPhone and then taking what it displays and
showing it on your screen. Then the attacker can steal
your passwords (when you key them in), your photos (when you
take them), just by capturing stuff en route to the real
device.
This is obviously an artificial example (though not too different
from an &lt;a href=&quot;https://jhalderm.com/pub/papers/evm-ccs10.pdf&quot;&gt;attack&lt;/a&gt;
demonstrated by Wolchok et al. on India&#39;s voting machines), but
as we&#39;ll see, not as artificial as you might think.
The same thing applies if you plug in a cable (as with
the iPhone lightning cable) because that interaction too
is potentially controlled by the attacker.&lt;/p&gt;
&lt;p&gt;Unlike the DRM case, then, if you have an interactive device
with a user interface, and you want to verify the software
that&#39;s running on it, you can&#39;t just use attestation: you
actually need to convince yourself that everything in between
your hands/eyeballs and the processor is doing what it&#39;s
supposed to do. In the most general form of the problem,
you are given some totally unknown device and have to determine
what is running on it. This is an incredibly difficult problem
because it more or less requires tearing down every component
in the device to ensure that it&#39;s what it appears to be
(you&#39;re not trusting whatever&#39;s printed on the package, right?).
This is incredibly expensive and time consuming
and not really practical for your average person -- I,
for one, don&#39;t own an electron microscope -- and worse yet,
it destroys the device, so you&#39;ve just convinced yourself
that this unit is OK, but now you need another unit, and how
do you know that one is OK?&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Obviously, this isn&#39;t going to happen -- though that should
make you pretty nervous about assuming your devices are
secure, and is one reason for concerns about foreign
chip manufacturing -- but it&#39;s useful to look at a simpler
problem: assume that the hardware is what it appears to be
and just verify that the software is fine.&lt;/p&gt;
&lt;h3 id=&quot;verifying-software&quot;&gt;Verifying Software &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/verifying-software/#verifying-software&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Let&#39;s start with the simplest case: we&#39;ve got a simple
computer with a CPU attached to a storage device like
a solid state drive (SSD), as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/verifying-computer.png&quot; alt=&quot;Simplified image of computer with memory&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The CPU loads its program off the flash drive and
executes it. This is great! We can just take the computer
apart, read the data off the SSD and we know
exactly what the CPU is going to do (assuming, again,
that we know what the software &lt;em&gt;ought&lt;/em&gt; to look like)
right? Wrong. The problem is that that picture I just showed you
is simplified. A more accurate picture is shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/verifying-computer2.png&quot; alt=&quot;Image of computer with SD&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The thing is that an SSD isn&#39;t just dumb storage. It&#39;s
actually a little computer of its own &lt;em&gt;attached&lt;/em&gt; to the dumb
storage. That computer takes care of interfacing with your
computer as well as managing the use of the memory for things
like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Wear_leveling&amp;amp;oldid=1018889309&quot;&gt;wear leveling&lt;/a&gt;.
Because this is a computer, it&#39;s got its own software
(though the technical term is firmware) and (surprise!) that
software &lt;a href=&quot;https://support-en.wd.com/app/products/product-detail/p/276#WD_downloads&quot;&gt;can be updated&lt;/a&gt; from the computer. This means that an attacker who
subverts the computer can then reprogram the SSD controller
firmware to lie about the contents of the SSD. Moreover
it can give different answers at different times; for instance
the firmware can recognize the typical access pattern associated
with the CPU loading software and give one set of answers
(the malicious software) and the pattern associated with
just reading the SSD contents and give another set (innocuous
data).&lt;/p&gt;
&lt;p&gt;It&#39;s not impossible to solve this specific problem. For instance, you could
attach a protocol analyzer to the connection between the
CPU and the SSD to verify that the right data was being
loaded (though of course it&#39;s probably some work to reassemble
it). Another option would be to tear down the SSD and directly
read the firmware. But neither
of these are really straightforward techniques available
to your average user. I, for instance, own neither a PCIe protocol
analyzer nor an electron microscope.&lt;/p&gt;
&lt;p&gt;More importantly, this problem is replicated all throughout
your computer, which is full of these little processors
(PCI controllers, USB controllers, baseband processors, graphics
cards, power controllers, etc.).
It&#39;s not uncommon for even keyboards to have their own processors
(so you can reprogram the keys, for instance).
Not all of these are reprogrammable from the CPU, but a lot of them
are, and many have a fair amount of access to what&#39;s happening on
the computer. For instance, the graphics card gets to see -- and
control -- everything that&#39;s shown on the display.
If you
want to be sure what your computer is doing, you need to
be able to examine each and every one of them, and I haven&#39;t
even mentioned that the CPU itself may have
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Microcode&amp;amp;oldid=1034986531&quot;&gt;microcode&lt;/a&gt;
which that controls aspects of its behavior and &lt;a href=&quot;https://software.intel.com/content/www/us/en/develop/articles/software-security-guidance/best-practices/microcode-update-guidance.html&quot;&gt;can be updated&lt;/a&gt;. The point here is that all your interactions with
the computer are mediated by a pile of other processors whose
code you can&#39;t directly inspect.&lt;/p&gt;
&lt;h2 id=&quot;applications&quot;&gt;Applications &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/verifying-software/#applications&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is already pretty long, but I did want to tie it back to two
other topics.&lt;/p&gt;
&lt;p&gt;First, we have the question of verifying the software on Apple
devices, as Apple &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/&quot;&gt;suggests&lt;/a&gt; in
order to ensure that the device has the right CSAM database.
As should be clear from the above, this is highly impractical.
The purpose of this attack is to verify that Apple hasn&#39;t
deliberately given you special software with a different
database, but your only way of verifying any of this is
through Apple&#39;s own interface. In order to do better, you&#39;d
need to more or less totally disassemble your iPhone and
then start digging through the pieces; obviously not
something your average user is going to do.&lt;/p&gt;
&lt;p&gt;Second, we have voting machines. As I&#39;ve &lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/&quot;&gt;mentioned before&lt;/a&gt;,
they&#39;re really just general purpose computers, but that
means that they&#39;re programmable and so an attacker can
reprogram them. For the same reasons as with the iPhone,
if the machine is &lt;em&gt;potentially&lt;/em&gt; compromised, there&#39;s no practical
reason to make sure that it&#39;s not &lt;em&gt;actually&lt;/em&gt; been compromised.
This makes chain of custody of the machines extremely important:
if the machine is ever in the hands of a potential attacker, you
need to assume it&#39;s been compromised (hence the decision by
Maricopa County to &lt;a href=&quot;https://truthout.org/articles/az-will-spend-millions-to-replace-voting-machines-compromised-by-gop-audit/&quot;&gt;replace voting machines that were improperly
secured during the Cyber Ninja &amp;quot;audit&amp;quot;&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;None of this is to say that if computer is compromised by the average
attacker that they&#39;re going to overwrite all of these processors.
However, it does mean that it&#39;s very hard to convince yourself
that your computer is secure if it&#39;s been compromised by a dedicated
and sophisticated attacker. And of course if it&#39;s been physically
in the hands of such an attacker, you&#39;d be best served to
take the data off via some kind of airgapped mechanism and
then destroy the device, because you can&#39;t really every trust it again.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The obvious reason to do this is to disable viewing when
your subscription runs out. &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that there are a number of cases where commodity
applications keep &amp;quot;secrets&amp;quot;, such as when they embed
API keys which are used to access Web services. Generally,
these secrets are only intended to deter attackers who
aren&#39;t trying very hard. &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: I am omitting here discussion of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Indistinguishability_obfuscation&amp;amp;oldid=1041465168&quot;&gt;indistinguishability obfuscation&lt;/a&gt;,
a cryptographic technique for doing obfuscation. I don&#39;t
understand it well enough to have an opinion on how well
it works, but as far as I know, is not currently in production
use. &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Typically, this would be some sort of private key, with
the public key being signed by the hardware manufacturer. &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though in practice there&#39;s a long history of people
figuring out how to attack this kind of device
using only software. &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
iPhones already make use of another form of trusted computing,
which is that they will only run software authorized by Apple.
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Jailbreaking_(iOS)&amp;amp;oldid=1042250966&quot;&gt;Jailbreaking&lt;/a&gt;
is the process of removing these protections so you can install
software of your choice. You actually probably could get
away without jailbreaking the device by taking it apart
and just forwarding the signals to and from the remote touchscreen.
ad     &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
If you were really serious, you could buy a pile of
units and randomly select a bunch for teardown,
and if they all turned up fine, feel reasonably
confident that most of the units in the batch were
also OK. &lt;a href=&quot;https://educatedguesswork.org/posts/verifying-software/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Perceptual versus cryptographic hashes for CSAM scanning</title>
		<link href="https://educatedguesswork.org/posts/perceptual-hash/"/>
		<updated>2021-08-24T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/perceptual-hash/</id>
		<content type="html">&lt;p&gt;As I &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/&quot;&gt;discussed earlier&lt;/a&gt;
there has been a lot of talk about collisions in the NeuralHash perceptual hash
used for CSAM detection. While I don&#39;t think these collisions are necessarily
that serious and Apple has proposed some countermeasures for dealing with them,
it&#39;s worth asking whether this is the best design.&lt;/p&gt;
&lt;p&gt;To recap: a cryptographic hash such as
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=SHA-2&amp;amp;oldid=1036646388&quot;&gt;SHA-256&lt;/a&gt;
is designed to make it prohibitively expensive to create two inputs with the
same hash output (a collision).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/perceptual-hash/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
However, for the same reason that it&#39;s
hard  to find a collision,
it&#39;s also trivial to create two inputs that are very perceptually
similar, but have different hashes (in general, a change of a single
bit will do it). Importantly, you don&#39;t need to know anything about
the internal structure of the hash algorithm to do this, it&#39;s just a
basic property of cryptographic hashes. The result of this is that if
you have a CSAM detection system that&#39;s based on checking against a
list of cryptographic hashes of those images, it&#39;s easy for an
attacker to alter a given CSAM image without changing the image in any
meaningful way, e.g., by changing the color of a single pixel slightly.&lt;/p&gt;
&lt;p&gt;Perceptual hashes attempt to address this issue by trading off increased
ease of forgery for decreased ease of evasion. They&#39;re designed so that
similar-looking images have the same hash, which means that it&#39;s much
harder to alter a given image to look the same but still have a different hash
(&lt;strong&gt;if you don&#39;t know the algorithm&lt;/strong&gt;).
The price of this is that it&#39;s also much easier to alter a given non-CSAM
image to have a given hash (&lt;strong&gt;but only if you do know the algorithm&lt;/strong&gt;).
This tradeoff makes sense when you realize that in conventional systems
such as Bing, Gmail, Facebook, etc. the hashing is done on the server
side and so the algorithm (usually &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=PhotoDNA&amp;amp;oldid=1037301298&quot;&gt;PhotoDNA&lt;/a&gt;)
can be kept secret. However, the way that Apple&#39;s system works requires
NeuralHash to be run on the client, which means that -- as we have
seen -- it&#39;s inherently at much higher risk of exposure. However, once
the hash is publicly known, this changes the situation significantly
and it becomes trivial for an attacker to either:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Alter an image so it has a different hash in order to evade detection.&lt;/li&gt;
&lt;li&gt;Create an innocuous image with a hash that&#39;s in the database (assuming
they already know such a hash) in order to frame someone else.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It seems like there are two main classes of modified images that a perceptual
hash can detect that a cryptographic hash does not:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Those which have been altered for some non-adversarial purpose
(e.g., cropped)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Those which have been altered for the purpose of evasion&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Apple&#39;s system will of course catch the first type of modified image, but
because it&#39;s relatively straightforward to create an altered
image which will evade NeuralHash, it&#39;s not clear how effective
it will be at detecting the second type. As noted above, it&#39;s not
going to be effective against people who specifically altered
the images to evade NeuralHash, but that doesn&#39;t mean it won&#39;t
be effective at all. For instance, there might have been images
which were altered to evade some other hash algorithm or Apple
could periodically modify NeuralHash. This isn&#39;t something
that they can do that often, but when they do, it would presumably
sweep up a number of images which had been altered to evade
the previous version.&lt;/p&gt;
&lt;p&gt;With that said, it&#39;s not clear how much alteration for evasion there
is really going to be. In general, it&#39;s important to note that the
whole system as currently designed is quite easy to evade: just don&#39;t
upload your images to iCloud. Admittedly, the people doing the image
construction might be sophisticated, thus allowing the people they
send the images to to evade detection even if they aren&#39;t careful
enough not to use iCloud, but it also seems like the word not to
use iCloud is likely to get out pretty fast.&lt;/p&gt;
&lt;p&gt;Another way to get at that question is to ask what happens now. Specifically: what fraction
of images that are flagged by PhotoDNA or similar systems are bit-for-bit
identical to the original image? If this number is very high -- in an environment
where evasion is quite a bit harder -- then it suggests that there isn&#39;t
a lot of alteration, whether adversarial or not
(though of course it might also be the case that the
perceptual hash is so good that it&#39;s not worth trying to evade; perhaps
looking at a historical baseline from before the perceptual hash was
rolled out would help).
In any case, if there aren&#39;t a lot of altered images, then it might
be worth reconsidering a
cryptographic hash, which would have effectively no risk of forgery&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/perceptual-hash/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
thus making a bunch of Apple&#39;s back-end machinery (the second hash and
the visual inspection) unnecessary.&lt;/p&gt;
&lt;p&gt;I don&#39;t know if there&#39;s any public data on this -- I don&#39;t have
any -- but it seems like it might be useful input to this kind of design
question.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: The jargon here is that finding two inputs of
any type that have the same hash is called a &lt;em&gt;collision&lt;/em&gt;. Finding
a second input that has the same hash as an existing input is
called a &lt;em&gt;second preimage&lt;/em&gt; and finding an input that has a given
hash without knowing the message is called a &lt;em&gt;first preimage&lt;/em&gt;.
For obvious reasons, the difficulty goes first preimage &amp;gt; second preimage &amp;gt; collision.
 &lt;a href=&quot;https://educatedguesswork.org/posts/perceptual-hash/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Greg Maxwell &lt;a href=&quot;https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX//issues/1#issuecomment-903181678&quot;&gt;suggests&lt;/a&gt;
that it might be possible to create a sort-of-perceptual hash with
a low risk of forgery but also some resistance to evasion,
but the design he proposes sounds pretty evasion-friendly,
so it&#39;s not clear how useful that is here. &lt;a href=&quot;https://educatedguesswork.org/posts/perceptual-hash/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>SF/Fantasy you should be reading</title>
		<link href="https://educatedguesswork.org/posts/science-fiction/"/>
		<updated>2021-08-22T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/science-fiction/</id>
		<content type="html">&lt;p&gt;I&#39;m a big science fiction reader, and sometimes people ask me for
recommendations, so here goes. Other
good lists include &lt;a href=&quot;https://www.npr.org/2021/08/18/1027159166/best-books-science-fiction-fantasy-past-decade&quot;&gt;NPR&lt;/a&gt;
and &lt;a href=&quot;https://noahpinion.substack.com/p/my-sci-fi-novel-recommendations&quot;&gt;Noah Smith&lt;/a&gt;.
These have some overlap, but there&#39;s also a bunch of new stuff here.&lt;/p&gt;
&lt;h2 id=&quot;peter-watts%3A-blindsight%2C-freeze-frame-revolution%2C&quot;&gt;Peter Watts: &lt;a href=&quot;https://www.amazon.com/Blindsight-Firefall-Book-Peter-Watts-ebook/dp/B003K15EKM/&quot;&gt;Blindsight&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/Freeze-Frame-Revolution-Peter-Watts-ebook/dp/B083G6NPWW&quot;&gt;Freeze Frame Revolution&lt;/a&gt;, &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#peter-watts%3A-blindsight%2C-freeze-frame-revolution%2C&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Whenever I find my will to live becoming too strong, I read Peter Watts&lt;/em&gt; -- James Nicoll&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&#39;m constantly trying to sell Peter Watts&#39; stuff to anyone who will
listen because it&#39;s brilliant, but let&#39;s face it, it&#39;s also depressing
as hell. Watts is a trained biologist and every Watts book is full of
incredible ideas but his core concern is the nature and uses of
consciousness and intelligence. Some examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Vampires are actually an extinct hominid that is a predator of humans.
All the historical vampire myths are based in that biology:
they&#39;re super-intelligent so they can outthink us, sociopathic so that
they don&#39;t mind eating intelligent prey, can hibernate to
avoid eating through the entire human population, and allergic to crosses
because their enhanced brain wiring and pattern recognition responds
badly to right angles (&amp;quot;the cruciform glitch&amp;quot;). Naturally, scientists bring
them back through genetic engineering when life gets too complicated
for normal human brains.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Consciousness interferes with reaction time, so the military makes
&amp;quot;zombies&amp;quot; which have their consciousness suppressed and
thus are more effective soldiers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A billion-year-plus mission to position wormhole gates around
the galaxy run by an AI (&amp;quot;the Chimp&amp;quot;) which is deliberately designed
to be dumber than humans even though we know how to build super-human
AI; a smarter computer might get its own ideas.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Watts has made much of his writing free at &lt;a href=&quot;https://rifters.com/&quot;&gt;rifters.com&lt;/a&gt;,
so you can try it out without commitment -- though I&#39;m sure he&#39;d appreciate
your money. Rifters also includes much of the technical background
(and of course the books themselves have footnotes to Watts&#39;s sources).&lt;/p&gt;
&lt;p&gt;See also &lt;a href=&quot;http://clarkesworldmagazine.com/watts_01_10/&quot;&gt;The Things&lt;/a&gt;, a retelling
of &amp;quot;The Thing&amp;quot; from the perspective of the monster. Trigger warning.&lt;/p&gt;
&lt;h2 id=&quot;malka-older%3A-infomacracy&quot;&gt;Malka Older: &lt;a href=&quot;https://www.amazon.com/Infomocracy-Book-One-Centenal-Cycle-ebook/dp/B0151U75ME&quot;&gt;Infomacracy&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#malka-older%3A-infomacracy&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Set during an election in a nearish future of &amp;quot;microdemocracy&amp;quot;. Nation
states have largely disappeared, be replaced with &amp;quot;centenals&amp;quot;:
localities of 100,000 people that vote to be governed by one or the
other &amp;quot;government&amp;quot; (effectively a supra-national political party, but
ranging from corporations like Philip Morris or Sony to more
traditional agenda-oriented parties like &amp;quot;Policy1st&amp;quot;). The result is a
checkerboard of jurisdictions with different governments controlling
adjacent territories mixed together (a bit like the &amp;quot;franchulates&amp;quot; in
&lt;a href=&quot;https://www.amazon.com/Snow-Crash-Novel-Neal-Stephenson-ebook/dp/B000FBJCJE&quot;&gt;Snow Crash&lt;/a&gt;
but much more realistic feeling). The governments also compete for
the &amp;quot;supermajority&amp;quot; (a majority of centenals, I think), which is
a form of overall government.&lt;/p&gt;
&lt;p&gt;Much of the action centers on &amp;quot;Information&amp;quot;, which seems to be a
combination of the Internet and a giant network of fact checkers
dedicating to providing unbiased information (e.g., real-time rebuttals
of lies in political ads). This is a fascinating idea, but from
the perspective of 2021 (Infomacracy came out in 2016), the idea of a single unbiased
source that people basically trust feels a bit like wishful thinking.&lt;/p&gt;
&lt;p&gt;Older has written two sequels, &lt;a href=&quot;https://www.amazon.com/gp/product/B01MZ1I8LO&quot;&gt;Null Set&lt;/a&gt;
and &lt;a href=&quot;https://www.amazon.com/gp/product/B078X28JP1&quot;&gt;State Tectonics&lt;/a&gt;, but I haven&#39;t
read them yet.&lt;/p&gt;
&lt;p&gt;Trigger warning for cryptographers: straight-up Internet voting and it&#39;s not even
end-to-end.&lt;/p&gt;
&lt;h2 id=&quot;john-barnes%3A-a-million-open-doors%2C-earth-made-of-glass%2C-the-merchants-of-souls%2C-the-armies-of-memory&quot;&gt;John Barnes: &lt;a href=&quot;https://www.amazon.com/Million-Open-Doors-John-Barnes/dp/031285210X/ref=sr_1_1?dchild=1&amp;amp;keywords=a+million+open+doors&amp;amp;qid=1625507996&amp;amp;sr=8-1&quot;&gt;A Million Open Doors&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/Earth-Made-Glass-Giraut-Barnes/dp/0812551613/ref=sr_1_1?dchild=1&amp;amp;keywords=earth+made+of+glass&amp;amp;qid=1625508031&amp;amp;sr=8-1&quot;&gt;Earth Made of Glass&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/Merchants-Souls-John-Barnes/dp/0812589696/ref=sr_1_1?dchild=1&amp;amp;keywords=merchants+of+souls+barnes&amp;amp;qid=1625508057&amp;amp;sr=8-1&quot;&gt;The Merchants of Souls&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/Armies-Memory-Thousand-Cultures/dp/0765342243/ref=sr_1_4?dchild=1&amp;amp;keywords=armies+of+memory+barnes&amp;amp;qid=1625508079&amp;amp;sr=8-4&quot;&gt;The Armies of Memory&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#john-barnes%3A-a-million-open-doors%2C-earth-made-of-glass%2C-the-merchants-of-souls%2C-the-armies-of-memory&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s hard to even know where to start here. These four books all take
place in &amp;quot;The Thousand Cultures&amp;quot; universe. Humans have terraformed and settled the nearby planets
by slowboat, with each individual colony having a designed culture
intended to live out one one ideal or another. The protagonist, Giraut
Leones, comes from Nou Occitan, a colony modeled after the old
Occitan troubadours, valorizing art, music, and dueling&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-fiction/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and in which history has been rewritten to reinforce that.
For instance:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;After a moment she smiled at me, tentatively as if
afraid I would shout at her, and said &amp;quot;Well, if they charge us,
we&#39;ll go to jail. Historically, we&#39;re in good company:
Jesus, Peter, Paul ... Adam Smith was burned at the stake
on Threadneedle Street, and Milton Friedman was eaten
by cannibals in Zurich.&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;Let&#39;s hope it doesn&#39;t come to that,&amp;quot; I said hastily.
I knew who the first three were, of course, and later on
I was glad I had no idea and so said nothing about the other
two, because they turned out to be part of the Culture
Variant History--the mythic story that founders of cultures
were allowed to load in as real history.
Of all the silly things that happened during the Diaspora,
that was one of the silliest, for it resulted in permanent
deep cleavages among the Thousand Cultures; the first time
that I heard an Interstellar making a speech on
a streetcorner proclaiming that Edger Allan Poe did not die
in the Paris Uprising of 1846, that Rimbaud had never been
King of France, and that Mozart was not killed by Beethoven
in a duel, I challenged him and cut him down like a mad dog.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Thanks to the limit in the speed of light
each planet
isolated until the development of the springer, which
provides instant interplanetary
teleportation. This of course changes everything, as the cultures are
brought back into contact with each other.&lt;/p&gt;
&lt;p&gt;These books are a fantastic example of a series which starts in one
place and ends in another. The first book is a pretty straightforward
coming of age story but by the end of the series
Barnes has touched on: the ethics of strong AI
and how you get it to work for you (the answer is not nice),
aging, immortality through personality recording, minds as software, and
the meaning of life in a post-scarcity society.&lt;/p&gt;
&lt;p&gt;Barnes is probably better known for his ultraviolent &amp;quot;Kaleidoscope Century&amp;quot;
and &amp;quot;Mother Of Storms&amp;quot;, but I far prefer this series -- which,
is still somewhat violent -- and was shocked to see that the first two are out of print.&lt;/p&gt;
&lt;h2 id=&quot;raphael-carter%3A-the-fortunate-fall&quot;&gt;Raphael Carter: &lt;a href=&quot;https://www.amazon.com/Fortunate-Fall-Raphael-Carter/dp/031286034X/ref=sr_1_1?dchild=1&amp;amp;keywords=the+fortunate+fall&amp;amp;qid=1625509266&amp;amp;sr=8-1&quot;&gt;The Fortunate Fall&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#raphael-carter%3A-the-fortunate-fall&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Raphael Carter&#39;s only book and even though it&#39;s great it&#39;s
one I feel bad recommending because
it&#39;s effectively out of print, though you can still get copies on
Amazon. This is set in aftermath of a US-led tyranny/McGenocide.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-fiction/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Most of
the world (the &amp;quot;Fusion of Historical Nations&amp;quot;) only slightly more high
tech than what we have now with the exception of a cyberpunk style
jacks which let you fully interface with an immersive VR-style
Internet policed by totalitarian &amp;quot;Weavers&amp;quot; whose job is to keep is to
keep everyone in line. By contrast, Africa is free, high tech, and
closed off to the rest of the world. The main character is
a &amp;quot;camera&amp;quot;, a reporter feeding everything she sees and feels into the
net. It&#39;s almost impossible to explain
the rest of this without giving away the plot, except to say that you should read it.&lt;/p&gt;
&lt;h2 id=&quot;wil-mccarthy%3A-the-collapsium%2C-the-wellstone%2C-lost-in-transmission%2C-to-crush-the-moon&quot;&gt;Wil McCarthy: &lt;a href=&quot;https://www.amazon.com/Collapsium-Queendom-Sol-Wil-McCarthy/dp/055358443X&quot;&gt;The Collapsium&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/gp/product/B08DF3VKDG/ref=dbs_a_def_rwt_bibl_vppi_i2&quot;&gt;The Wellstone&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/gp/product/B08P5Z8K62/ref=dbs_a_def_rwt_hsch_vapi_taft_p1_i7&quot;&gt;Lost In Transmission&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/gp/product/B08XDF2DRV/ref=dbs_a_def_rwt_hsch_vapi_taft_p1_i6&quot;&gt;To Crush The Moon&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#wil-mccarthy%3A-the-collapsium%2C-the-wellstone%2C-lost-in-transmission%2C-to-crush-the-moon&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Straight up hard science fiction with super science and more super science. Not quite at the &lt;a href=&quot;https://en.wikipedia.org/wiki/Technological_singularity&quot;&gt;Vingeian singularity&lt;/a&gt;, but
pretty close, with nanotech, &lt;a href=&quot;https://en.wikipedia.org/wiki/Programmable_matter&quot;&gt;programmable matter&lt;/a&gt;, quantum-dissasembly-reassembly
teleportation (&amp;quot;faxing&amp;quot;), human backup and replication, and thus near personal immortality. So what could possibly go wrong?
Well, for starters, who wants to grow up in a world where your parents never get out of your way?
The first book (The Collapsium) is a bit rough and the whole thing suffers from McCarthy&#39;s desire for
implausibly heroic and brilliant characters, but the science part really pays off, whether it&#39;s super-science, or, well, this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One man in a sphere of brass.&lt;/p&gt;
&lt;p&gt;One man alone in the vacuum of space.&lt;/p&gt;
&lt;p&gt;One man hurtling toward solid rock at forty meters per second--fast enough to kill him, to end his mission here and now, to cap a damnfool end on a long and decidedly damnfool life. To leave his children defenseless.&lt;/p&gt;
&lt;p&gt;In the porthole ahead is the planette Varna, his destination, swathed in white clouds and shining seas, in grasslands, in forests whose vertical dimension is already apparent against the dinner-bowl curve of horizin. Not planet: planette. It looks small because it &lt;em&gt;is&lt;/em&gt; small, barely twelve hundred meters across. Condensed matter core, fifteen hundred neubles--very nice. The surface workmanship is exquisite; he sees continents, islands, majestic little mountain ranges jutting up above the trees. Telescopes, he realizes, dont do justice to this remotest of Lune&#39;s satellites.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&quot;walter-jon-williams%3A-the-crown-jewels%2C-house-of-shards%2C-rock-of-ages&quot;&gt;Walter Jon Williams: &lt;a href=&quot;https://www.amazon.com/gp/product/B0056AT8F2/ref=dbs_a_def_rwt_hsch_vapi_taft_p2_i1&quot;&gt;The Crown Jewels&lt;/a&gt;,  &lt;a href=&quot;https://www.amazon.com/gp/product/B0057CX9F4/ref=dbs_a_def_rwt_hsch_vapi_taft_p3_i5&quot;&gt;House of Shards&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/gp/product/B0056NC48W/ref=dbs_a_def_rwt_hsch_vapi_taft_p3_i2&quot;&gt;Rock of Ages&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#walter-jon-williams%3A-the-crown-jewels%2C-house-of-shards%2C-rock-of-ages&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Believe it or not, three science fiction &lt;em&gt;caper comedies&lt;/em&gt; about the &amp;quot;allowed burglar&amp;quot; Drake Maijstral.
In the backstory, humanity gets conquered by the alien Khosali who more or less pick and
retain a mishmash of the  elements of our culture they think are most interesting and
mash them up:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Once in his suite, Maijstral settled his unease by watching a Western till it was
time to dress. This one, &lt;em&gt;The Long Night of Billy The Kid&lt;/em&gt;, was an old-fashioned
tragedy featuring the legendary rivalry between Billy and Elvis Presley for
the affections of Katie Elder. Katie&#39;s heart belonged to Billy, but despite
her tearful pleadings Billy rode the outlaw trail; and finally brokenhearted Katie left
Billy to go on tour with Elvis as a backup singer, while Billy rode on to his long-foreshadowed death at the hands of the greenhorn-inventor-turned
lawman Nikola Tesla.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The conquerers impose a lot of their own culture, including the custom of &amp;quot;allowed
burglary&amp;quot; in which theft is basically an extreme sport, with the
heists videoed and broadcast. Effectively a &lt;a href=&quot;https://en.wikipedia.org/wiki/Comedy_of_manners&quot;&gt;comedy of (alien)
manners&lt;/a&gt;, but with
people stealing stuff. Also, Elvis impersonators:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Garvikh really had them rocking. He had the audience in the palm of his furry hand.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;He had heard it said that he was the finest Elvis ever to be
born Khosalikh. Certainly, he was among the best Elvises
now alive. As part of his apprenticeship he had mastered
the difficult, antique Earth dialect, a dead language
no longer spoken anywhere, in which the King had
recorded his masterpieces. Garvikh had devoted thousands
of hours to a series of special exercised intended to
limber his sturdy Khosali hips and torso, never intended to
move with the fluidity more natural to the human form, so
that he could perform the demanding, difficult hip
thrusts, the stilted pigeon-toed walking style,
the sudden knee drops and whirling assaults on the microphone
that characterized the rigidly defined Elvis repertoire. This
was High Custom and High Custom performances required
the utmost in precision. Each step, each gesture, each
twitch of the hips or twist of the upper lip, was performed
with the utmost classical perfection, the most rigid
attention to form. There was no room for accident, for
spontaneity. All was performed with utmost care to
ensure that every nuance was subtly shaded and
subtly controlled, in the tradition of the great
Elvis Masters of the past.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You get the idea.&lt;/p&gt;
&lt;p&gt;Almost everything Williams has done is good, with a very wide range from &lt;a href=&quot;https://www.amazon.com/dp/B074C7D713?binding=kindle_edition&amp;amp;ref_=dbs_s_ks_series_rwt_tkin&amp;amp;qid=1625522032&amp;amp;sr=1-3&quot;&gt;sailing adventure novels&lt;/a&gt;
to &lt;a href=&quot;https://www.amazon.com/Voice-Whirlwind-Authors-Preferred-Hardwired-ebook/dp/B005WORWJ6/&quot;&gt;cyberpunk&lt;/a&gt;
to &lt;a href=&quot;https://www.amazon.com/gp/product/B007QQBRXU/ref=dbs_a_def_rwt_hsch_vapi_taft_p1_i9&quot;&gt;post-&lt;/a&gt;&lt;a href=&quot;https://www.amazon.com/gp/product/B00E5TLJES/ref=dbs_a_def_rwt_hsch_vapi_taft_p2_i3&quot;&gt;singularity&lt;/a&gt;.
He&#39;s more recently known for the military SF &lt;a href=&quot;https://www.amazon.com/gp/product/B000UOJTRQ/ref=dbs_a_def_rwt_bibl_vppi_i3&quot;&gt;Praxis&lt;/a&gt; novels, which are solid but not
as unique. See also the short story &lt;a href=&quot;https://en.wikipedia.org/wiki/Dinosaurs_(short_story)&quot;&gt;Dinosaurs&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;suyi-davies-okungbowa%3A-david-mogo%3A-godhunter&quot;&gt;Suyi Davies Okungbowa: &lt;a href=&quot;https://www.amazon.com/David-Mogo-Godhunter-Davies-Okungbowa/dp/1781086494&quot;&gt;David Mogo: Godhunter&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#suyi-davies-okungbowa%3A-david-mogo%3A-godhunter&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Described as &amp;quot;Godpunk&amp;quot;, this is
&lt;a href=&quot;https://en.wikipedia.org/wiki/Constantine_(film)&quot;&gt;Constantine&lt;/a&gt; meets
&lt;a href=&quot;https://www.amazon.com/American-Gods-Neil-Gaiman/dp/0380973650&quot;&gt;American Gods&lt;/a&gt; but
in Lagos. At some point in the future, the gods fall out of the sky
and now Lagos is full of various kinds of supernatural entities.
David Mogo&#39;s job -- well, really more like freelancing -- is to hunt
them down. This is really three stories in sequence more than than a novel
and good enough that when I saw that Okungbowa had a new &lt;a href=&quot;https://www.amazon.com/gp/product/B08HLNFK9K/&quot;&gt;book&lt;/a&gt; out I bought it sight unseen.&lt;/p&gt;
&lt;h2 id=&quot;robert-jackson-bennett%3A-city-of-stairs%2C-city-of-blades%2C-city-of-miracles&quot;&gt;Robert Jackson Bennett: &lt;a href=&quot;https://www.amazon.com/Stairs-Divine-Cities-Jackson-Bennett/dp/080413717X&quot;&gt;City of Stairs&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/Blades-Divine-Cities-Jackson-Bennett/dp/0553419714/&quot;&gt;City of Blades&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/Miracles-Divine-Cities-Jackson-Bennett/dp/0553419730/&quot;&gt;City of Miracles&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#robert-jackson-bennett%3A-city-of-stairs%2C-city-of-blades%2C-city-of-miracles&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Set in some unspecified alternate world in which &amp;quot;the Continent&amp;quot;
(vaguely Russian), gives rise to local &amp;quot;divinities&amp;quot; (effectively gods)
who then go on to subjugate and enslave the island Saypur (vaguely
Indian). In the backstory, the Saypuris rebel, kill the divinities and
invade and occupy the now devastated Continent (acting somewhat like
the 19th century colonial British Empire), with what appear to be somewhat
conflicting motivations between moderinizing it keeping it down so it can&#39;t threaten
Saypur. It turns out, though, that not
all the divinities are dead, which is the setup for the rest of
the series. There is some pretty amazing worldbuilding here, especially
of the mythology of the divinities themselves, who are simultaneously alien
and yet familiar.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;Kolkan wished for nothing more than for his followers to
lead a good and ordered life. After the city of Kolkashtan
was established, he told his followers to come to him with any
questions, any concerns, and he would be there to answer
them, to judge them, and to help them. And they responded quite
enthusiastically. There are records of lines of poeple
five, ten, fifteen miles long. Of people fainting, starving,
growing sick and infirm as they waited. The historical
accounts are vague, but it&#39;s estimated Kolkan lisented to however
many millions of people, judging day and night, sitting in one place,
for over one hundred and sixty years.&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These are my favorite of Bennett&#39;s books, but also check out
&lt;a href=&quot;https://www.amazon.com/gp/product/B008AS84PM/ref=dbs_a_def_rwt_hsch_vapi_taft_p1_i6&quot;&gt;American Elsewhere&lt;/a&gt;
and &lt;a href=&quot;https://www.amazon.com/gp/product/B077RG422Z/ref=dbs_a_def_rwt_hsch_vapi_taft_p1_i1&quot;&gt;Foundryside&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;katherine-addison%3A-the-goblin-emperor&quot;&gt;Katherine Addison: &lt;a href=&quot;https://www.amazon.com/Goblin-Emperor-Katherine-Addison-ebook/dp/B00FO6NPIO/&quot;&gt;The Goblin Emperor&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#katherine-addison%3A-the-goblin-emperor&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The setting is straight up high fantasy (elves, goblins, etc.)
which is not my usual thing, but I enjoyed this one. Instead of the standard
Tolkienesque final war, this is much quieter.
The protagonist is the second-in-line son half-goblin son of
the elvish emperor who has been effectively banished
until the emperor and his son die in an airship accident and
he suddenly ascends to the throne and surprises everyone,
including himself, by being a good emperor, mostly by being a good
person. You&#39;ve seen this
general theme before, but the writing and world building are
excellent.&lt;/p&gt;
&lt;h2 id=&quot;sergei-lukyanenko%3A-night-watch&quot;&gt;Sergei Lukyanenko: &lt;a href=&quot;https://www.amazon.com/dp/B074CLBVRG?searchxofy=true&amp;amp;binding=kindle_edition&amp;amp;qid=1625525768&amp;amp;sr=1-4&quot;&gt;Night Watch&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#sergei-lukyanenko%3A-night-watch&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A series of six urban fantasy books. Living among us are Others: people
with supernatural powers divided into (surprise!) Light and Dark. Years
ago, they reached a truce and established two organizations to keep the
balance: the Night Watch, composed of Light Others, which
monitors the Dark and the Day Watch, composed of Dark Others, which
monitors the Light. These feel more like spy novels than they do
like fantasy (though without the &amp;quot;this is all boring bureaucracy&amp;quot; feel
of Stross&#39;s Laundry novels). Originally written in Russian and the
translation can be a bit uneven (not that I speak Russian; I&#39;m
just talking about how the English comes out), but definitely
worth a look. These are the only foreign language books on this
list.&lt;/p&gt;
&lt;h2 id=&quot;tim-powers%3A-declare.&quot;&gt;Tim Powers: &lt;a href=&quot;https://www.amazon.com/Declare-Novel-Tim-Powers/dp/0380976528&quot;&gt;Declare&lt;/a&gt;. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#tim-powers%3A-declare.&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Tim Powers is rightfully famous, but for my money this is his best book. &lt;em&gt;Declare&lt;/em&gt; is
a secret history of the 20th century, recasting the standard cold war
espionage thriller (e.g., Le Carre) as instead a conflict over supernatural
power, and in particular a colony of djinn on Mt. Ararat. Powers perfectly
matches the tone of the spy thriller while also somehow having the
supernatural elements make perfect sense.&lt;/p&gt;
&lt;p&gt;The frame story here is the life of the Soviet double agent &lt;a href=&quot;https://en.wikipedia.org/wiki/Kim_Philby&quot;&gt;Kim Philby&lt;/a&gt;. Powers writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In a way, I arrived at the plot for this book by the same method that astronomers
use in looking for a new planet -- they look for &amp;quot;perturbations&amp;quot;, wobbles in the
orbits of the planets they&#39;re aware of, and they calculate the mass and
position of an unseen planet whose gravitational field could have caused
the observed perturbations -- and then they turn their telescopes on that
part of the sky and search for a gleam. I looked at all the seemingly
irrelevant &amp;quot;wobbles&amp;quot; in the lives of these people -- Kim Philby, his father,
T.E. Lawrence, Guy Burgess -- and I made it an ironclad rule that
I could not change or disregard any of the recorded facts, nor rearrange
any days of the calendar--and then I tried to figure out what momentous
but unrecorded fact could explain them all.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Other Powers worth reading: &lt;a href=&quot;https://www.amazon.com/Stress-Her-Regard-Tim-Powers/dp/1892391791&quot;&gt;The Stress of Her Regard&lt;/a&gt;
(Byron, Keats, and Shelley, vampire hunters!) and
&lt;a href=&quot;https://www.amazon.com/Last-Call-Novel-Fault-Trilogy-ebook/dp/B000UKOMX6/&quot;&gt;Last Call&lt;/a&gt;,
which is somehow about the competition to become the &lt;a href=&quot;https://en.wikipedia.org/wiki/Fisher_King&quot;&gt;Fisher King&lt;/a&gt;
(Powers is obsessed with the Fisher King, see also &lt;a href=&quot;https://www.amazon.com/Drawing-Dark-Novel-Del-Impact/dp/0345430816&quot;&gt;The Drawing of the Dark&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&quot;anne-leckie%3A-ancillary-justice%2C-ancillary-sword%2C-ancillary-mercy&quot;&gt;Anne Leckie: &lt;a href=&quot;https://www.amazon.com/dp/B0841XW64H&quot;&gt;Ancillary Justice&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/gp/product/B00I8289A0&quot;&gt;Ancillary Sword&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/gp/product/B00TOT9LEY&quot;&gt;Ancillary Mercy&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#anne-leckie%3A-ancillary-justice%2C-ancillary-sword%2C-ancillary-mercy&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This series (the first book won the Hugo, Nebula, and Arthur C. Clarke
awards) was heavily hyped and while it&#39;s not one of my absolute
favorites, I certainly agree it&#39;s solid. These books are largely set
in a space empire called the Radch, which generally semes pretty
unpleasant, with their SOP being to &amp;quot;annex&amp;quot; (i.e., conquer) a planet
and then kidnap a bunch of their citizens to be converted into
&amp;quot;ancillaries&amp;quot;: bodies operated by a warship AI. The protagonist
is an ancillary left over after its ship is destroyed.&lt;/p&gt;
&lt;p&gt;There was a lot of controversy over Ancillary Justice because of
a particular language choice: Radchaii society
doesn&#39;t think of gender as a first-class construct and doesn&#39;t
have gendered pronouns so Leckie decided to everyone as &amp;quot;she&amp;quot; and
&amp;quot;her&amp;quot;, use &amp;quot;sister&amp;quot; for any sibling, etc.. You get used to this pretty quick, but of course
there was a bunch of ridiculous Gamergate-style backlash. Don&#39;t
let that put you off.&lt;/p&gt;
&lt;h2 id=&quot;stephen-brust%3A-vlad-taltos-novels%2C-phoenix-guards-series&quot;&gt;Stephen Brust: &lt;a href=&quot;https://www.amazon.com/gp/product/B084RGQJRR&quot;&gt;Vlad Taltos Novels&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/gp/kindle/series/B071F18YK8?ie=UTF8&quot;&gt;Phoenix Guards Series&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#stephen-brust%3A-vlad-taltos-novels%2C-phoenix-guards-series&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;All of these novels are in the same fantasy setting: the planet Dragaera
which is -- through unspecified means -- populated by Dragaerans
(effectively elves: tall and incredibly long-lived, though
they call themselves &amp;quot;humans&amp;quot;) and Easterners
(humans, specifically Hungarians), and a bunch of other magical
types. The Taltos books are (mostly) told from the perspective of
Vlad Taltos, an Easterner living in the Dragaeran empire and
working for the equivalent of the mafia as an assassin and
minor crime boss. These are written in mostly a fairly glib,
hard-boiled style. Vlad is a pretty morally ambiguous character,
so you kind of have to get used to that.&lt;/p&gt;
&lt;p&gt;The Phoenix Guards series are straight-up Dumas pastiche,
with the first one, &amp;quot;The Phoenix Guards&amp;quot;, being effectively
&amp;quot;The Three Musketeers&amp;quot; and the second, &amp;quot;Five Hundred Years After&amp;quot;
being &amp;quot;Twenty Years After&amp;quot; (because the Dragaerans are incredibly
long lived, get it?). Brust does a pretty good job of imitating
-- exaggerating, really -- Dumas&#39;s ornate writing style, so
good if you like that sort of thing, not so good if you don&#39;t.&lt;/p&gt;
&lt;h2 id=&quot;aliette-de-bodard%3A-obsidian-and-blood-series&quot;&gt;Aliette de Bodard: &lt;a href=&quot;https://www.amazon.com/gp/product/B08L63H63X&quot;&gt;Obsidian and Blood Series&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#aliette-de-bodard%3A-obsidian-and-blood-series&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Detective novels set in the Aztec empire. The main character is the
High Priest of the Dead, except that he solves crimes -- with magic
because the Aztec gods are real and so their rituals work. De Bodard
does a great job of immersing you in a culture which most people
will find truly alien.&lt;/p&gt;
&lt;p&gt;Also worth checking out are de Bodard&#39;s Xuya series set in an alternate
universe with a Vietnamese space empire.&lt;/p&gt;
&lt;h2 id=&quot;dan-simmons%3A-just-about-everything&quot;&gt;Dan Simmons: Just about everything &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#dan-simmons%3A-just-about-everything&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Simmons is best known for his Hyperion series, which is definitely solid, but really
he is a master of just about any genre. He got his start writing terror
(favorites: &lt;a href=&quot;https://www.amazon.com/Children-Night-Vampire-Dan-Simmons-ebook/dp/B0089VP0PC&quot;&gt;Children Of The Night&lt;/a&gt;,
&lt;a href=&quot;https://www.amazon.com/gp/product/B004TLHPZ4&quot;&gt;Summer of Night&lt;/a&gt;) then moved
into science fiction, including not only Hyperion
but also
&lt;a href=&quot;https://www.amazon.com/Ilium-Book-1-Dan-Simmons-ebook/dp/B000FC129Q&quot;&gt;Ilium&lt;/a&gt; and
&lt;a href=&quot;https://www.amazon.com/Olympos-Ilium-Book-Dan-Simmons-ebook/dp/B000FCK97C&quot;&gt;Olympos&lt;/a&gt;,
retelling the Ilium and Odyssey through the lens of post-singularity humanity.
Most recently he&#39;s been writing historical science fiction/fantasy/horror. Standouts
here include &lt;a href=&quot;https://www.amazon.com/Terror-Novel-Dan-Simmons-ebook/dp/B000PAAH3A/&quot;&gt;The Terror&lt;/a&gt;
(a retelling of the lost &lt;a href=&quot;https://en.wikipedia.org/wiki/Franklin%27s_lost_expedition&quot;&gt;Franklin Expedition&lt;/a&gt;)
and &lt;a href=&quot;https://www.amazon.com/dp/B08NK2PLSN&quot;&gt;The Fifth Heart&lt;/a&gt; (Sherlock Holmes and
Henry James investigating the
suicide of &lt;a href=&quot;https://en.wikipedia.org/wiki/Marian_Hooper_Adams&quot;&gt;Clover Adams&lt;/a&gt;).
Less good though still credible are his attempts at &lt;a href=&quot;https://www.amazon.com/gp/product/B07G3L4B6P?ref_=dbs_dp_rwt_sb_tkin&amp;amp;binding=kindle_edition&quot;&gt;hard-boiled detective novels&lt;/a&gt;. Avoid Darwin&#39;s Blade.&lt;/p&gt;
&lt;p&gt;See also: &lt;a href=&quot;https://www.amazon.com/Muse-Fire-Dan-Simmons/dp/1596061812&quot;&gt;Muse of Fire&lt;/a&gt;
in which aliens have killed most of humanity and enslaved the rest, with
the protagonist being a member of a travelling Shakespeare troupe, Shakespeare
being one of the few pieces of human culture the aliens thought was
worthwhile.&lt;/p&gt;
&lt;p&gt;Warning: Simmons is an amazing writer but seems to have recently adopted
some extremely anti-Muslim political views (see this &lt;a href=&quot;https://www.npr.org/2011/07/28/137621172/one-rant-too-many-politics-mar-simmons-dystopia&quot;&gt;review&lt;/a&gt;
of Flashback, which I have not read). You&#39;ll have to factor that into
your calculations.&lt;/p&gt;
&lt;h2 id=&quot;p.-djeli-clark%3A-a-dead-djinn-in-cairo%2C-the-haunting-of-tram-car-015%2C-a-master-of-djinn&quot;&gt;P. Djeli Clark: &lt;a href=&quot;https://www.amazon.com/Dead-Djinn-Cairo-Tor-Com-Original-ebook/dp/B01DJ0NALI/&quot;&gt;A Dead Djinn in Cairo&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/Haunting-Tram-Car-015-ebook/dp/B07H796G2Z/&quot;&gt;The Haunting of Tram Car 015&lt;/a&gt;, &lt;a href=&quot;https://www.amazon.com/Master-Djinn-P-Dj%C3%A8l%C3%AD-Clark-ebook/dp/B08HKXS84X/&quot;&gt;A Master of Djinn&lt;/a&gt; &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-fiction/#p.-djeli-clark%3A-a-dead-djinn-in-cairo%2C-the-haunting-of-tram-car-015%2C-a-master-of-djinn&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This series is set in an alternate early 20th century some time 40ish years after
the mystic al-Jahiz &amp;quot;bored a hole into the Kaf, the other-realm of the djinn&amp;quot;,
letting the supernatural (back?) into the world. Egypt rivals the Western powers
who are dominant in our world and the Ministry of Alchemy, Enchangments, and Supernatural
entities is responsible for keeping a lid on everything. Effectively, these
are police procedurals but with magic, with epic stakes and set
against a rich supernatural backstory.&lt;/p&gt;
&lt;p&gt;Also good: &lt;a href=&quot;https://www.amazon.com/Black-Gods-Drums-Dj%C3%A8l%C3%AD-Clark/dp/1250294711&quot;&gt;The Black Gods Drums&lt;/a&gt;.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
With weapons called &amp;quot;neuroducers&amp;quot; which make you think
you&#39;ve been injured (mostly) don&#39;t physically harm you. &lt;a href=&quot;https://educatedguesswork.org/posts/science-fiction/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Ended after some hackers created the &amp;quot;unanimous army&amp;quot;
by taking over people&#39;s minds and turning them into
a single force swarming over everything and then once
victory was achieved, shutting down and just stranding
them thousands of miles away from their homes. &lt;a href=&quot;https://educatedguesswork.org/posts/science-fiction/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>What does the NeuralHash collision mean? Not much</title>
		<link href="https://educatedguesswork.org/posts/apple-csam-collision/"/>
		<updated>2021-08-19T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/apple-csam-collision/</id>
		<content type="html">&lt;p&gt;In today&#39;s Apple CSAM scanning news, it appears that Apple platforms
already have a NeuralHash &lt;a href=&quot;https://github.com/KhaosT/nhcalc&quot;&gt;APIs&lt;/a&gt;
built in and &lt;a href=&quot;https://github.com/AsuharietYgvar&quot;&gt;Asuhariet Ygvar (apparently a pseudonym)&lt;/a&gt; has &lt;a href=&quot;https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX&quot;&gt;reverse engineered&lt;/a&gt; the algorithm and built a tool to convert it to
the &lt;a href=&quot;https://onnx.ai/&quot;&gt;Open Neural Network Exchange (ONNX)&lt;/a&gt; format.
Based on that work,
&lt;a href=&quot;https://github.com/dxoigmn&quot;&gt;Cory Cornelius&lt;/a&gt; has constructed
a &lt;a href=&quot;https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1&quot;&gt;pair of images with the same hash&lt;/a&gt;, aka a &amp;quot;collision&amp;quot;.
&lt;a href=&quot;https://www.theverge.com/2021/8/18/22630439/apple-csam-neuralhash-collision-vulnerability-flaw-cryptography&quot;&gt;The&lt;/a&gt; &lt;a href=&quot;https://www.theregister.com/2021/08/18/apples_csam_hashing/&quot;&gt;coverage&lt;/a&gt; of this is kind of confusing
and there seems to be a bit of a sense that this is news of
a vulnerability
(though note that Jonathan Mayer is quoted in the Register article
making a number of the points I make below).
From my perspective, this isn&#39;t surprising and doesn&#39;t really change
the situation.&lt;/p&gt;
&lt;h2 id=&quot;threat-model&quot;&gt;Threat Model &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#threat-model&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As mentioned in &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/&quot;&gt;my original post&lt;/a&gt; and &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more&quot;&gt;followup&lt;/a&gt;, there are two major
attacks on the Apple CSAM system enabled by knowing the NeuralHash algorithm:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Evasion&lt;/em&gt;: Perturbing an existing CSAM image so that it has a different
hash from the one in the database so that you could then
distribute that image undetected.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Forgery&lt;/em&gt; Creating an innocuous image that has a hash that&#39;s already in
the database and distributing it to someone innocent so that
they are flagged by the scanning system (and potentially
subject to some sort of legal action).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that both attacks require knowing the NeuralHash algorithm,
but the latter also requires knowing the hash of at least one
entry in the database.&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that these attacks depend on opposed
properties of the hash. With something like a cryptographic hash
in which any change in the input changes the output with high
probability&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
the evasion attack is trivial and doesn&#39;t require knowing
the details of the hash algorithm: just change a single pixel
and you&#39;re done. The purpose of a perceptual hash like
NeuralHash is to make it so that small changes to the input
&lt;em&gt;don&#39;t&lt;/em&gt; change the output. That&#39;s why you need to know the
details of the algorithm in order to mount the evasion
attack, in order to tell which perturbations actually change
the hash value.&lt;/p&gt;
&lt;p&gt;By contrast, the forgery attack depends on it being relatively
easy to generate an image with a given hash value&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.
The structure of perceptual hash functions makes this
comparatively easy to do&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
The result is that if the attacker has a hash that corresponds to an
entry in the database then they can make an image that has that hash.
Less obviously, it&#39;s also possible to make an image that looks
nothing like the original image and still has the same hash,
as shown in the &lt;a href=&quot;https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1&quot;&gt;example collision&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://user-images.githubusercontent.com/1328/129860794-e7eb0132-d929-4c9d-b92e-4e4faba9e849.png&quot; alt=&quot;image 1&quot; /&gt;&lt;img src=&quot;https://user-images.githubusercontent.com/1328/129860810-f414259a-3253-43e3-9e8e-a0ef78372233.png&quot; alt=&quot;image 2&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s not obvious to me that this is a necessary property -- consider the case
of a hash that&#39;s just an 8x8 bitmap of the image -- but it seems to be a property
of NeuralHash and similar constructions; that&#39;s certainly what I and the
other analyses I have seen have assumed.
This is important because the purpose of the attack is to frame
someone by sending them images on their machine that they keep
around and upload to iCloud and this doesn&#39;t work if those
images are obviously CSAM.&lt;/p&gt;
&lt;h2 id=&quot;evasion&quot;&gt;Evasion &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#evasion&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Its not clear that Apple has any countermeasures for the evasion attack.
The primary one I can think of would be to have NeuralHash be secret,
thus making it hard to know whether a given perturbation actually
changed the hash.
Apple hasn&#39;t published the details of NeuralHash,
and we don&#39;t know that the version we&#39;re seeing here is the final
version, but unless they take some real efforts to conceal it
-- which, again, would undercut the verifiability claims they
have been making -- then we should assume that it will eventually
become known to attackers.&lt;/p&gt;
&lt;p&gt;This isn&#39;t an ideal property, but the whole design of the current
system assumes that there aren&#39;t any real attempts at evasion.
After all, Apple only scans images that are uploaded to iCloud,
if people don&#39;t want to be detected all they have to do is turn
off photo sharing to iCloud, so evasion is fairly straightforward.&lt;/p&gt;
&lt;h2 id=&quot;forgery&quot;&gt;Forgery &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#forgery&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Apple&#39;s system includes three countermeasures against forgery
attacks (and false positives):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The hash database itself is secret (blinded with a key known
to Apple).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;They screen potential CSAM images using a second perceptual
hash and only forward those which match for human review
(this was not initially announced but published last week).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;They do human review to see if images are actually CSAM.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first countermeasure is intended to prevent attackers
from knowing which hashes they should be targeting, as
the vast majority of hashes will not be in the database.
Note, however, that if an attacker knows a piece of CSAM
that is in the database, they can compute the hash themselves
if the know the NeuralHash algorithm, so we should expect
that at least some of the hashes will get out.&lt;/p&gt;
&lt;p&gt;The second countermeasure seems like a good idea, but I&#39;m not sure how
robust it&#39;s going to turn out to be. In order for it to work, we need
the secondary hash outputs to be independently distributed from
the the on-device hash, in the sense that two &lt;em&gt;different&lt;/em&gt; images
which have the same NeuralHash value are unlikely to have the same
value in the secondary hash.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
I don&#39;t know enough about the design of Apple&#39;s secondary hash to
know if this is true. My wild guess would be that it has a similar
structure to NeuralHash but just uses different features. In any
case, it would increase confidence in this process for Apple to
publish statistics about the overlap between these two hashes,
even if they can&#39;t publish the details (which they can&#39;t
because this countermeasure requires the secondary hash to
be secret).&lt;/p&gt;
&lt;p&gt;The human review is obviously the final backstop against forgery
attacks. This probably does a pretty good job of preventing false
reports to law enforcement, but it&#39;s not going to be great if
there needs to be a huge amount of human review.&lt;/p&gt;
&lt;h3 id=&quot;one-more-thing...&quot;&gt;One more thing... &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#one-more-thing...&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Ygvar writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: Neural hash generated here might be a few bits off from one
generated on an iOS device. This is expected since different iOS
devices generate slightly different hashes anyway. The reason is
that neural networks are based on floating-point calculations. The
accuracy is highly dependent on the hardware. For smaller networks
it won&#39;t make any difference. But NeuralHash has 200+ layers,
resulting in significant cumulative errors.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This actually seems like a minor operational problem: the CSAM
scanning system depends on the NeuralHash matching exactly,
so either Apple will need to make the API produce consistent
results or insert all the potential results into the database.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said above, the only really surprising thing here is that
a version of NeuralHash is already out there in Apple devices.
Both the evasion and forgery attacks are pretty obvious and
Apple has some -- albeit imperfect -- countermeasures in place,
so I don&#39;t think this materially changes the situation.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s easy to see that it&#39;s just high probability and not
certainty because the number of possible inputs is much
bigger than the number of hash values, and so there
must be at least two inputs with the same hash value. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
&amp;quot;Relatively&amp;quot; here means with a complexity significantly
less than 2&lt;sup&gt;b-1&lt;/sup&gt; where &lt;em&gt;b&lt;/em&gt; is the length of
the hash in bits. In this case, the hash seems to be
96 bits, so much less than 2&lt;sup&gt;96&lt;/sup&gt;, which is an
impractically large number of computations. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As I understand it, the intuition here is that these hashes are designed
so that similar images have similar hashes (a low &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Hamming_distance&amp;amp;oldid=1039371125&quot;&gt;Hamming distance&lt;/a&gt;.),
but this means that you can use optimization algorithms
to find your way from one hash to another by making
changes that progressively move you closer to the hash you
want. I&#39;m not sure if it&#39;s possible to design a
perceptual hash without this feature. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that this is not the same as the hashes being different.
For instance, it&#39;s easy to design two hashes H1 and H2
where the hashes tend to be different, just by doing
H2 = SHA-1(H1). This wouldn&#39;t solve the problem here,
because hash collisions in H1 would still be collisions in
H2. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-collision/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>More on Apple&#39;s Client-side CSAM Scanning</title>
		<link href="https://educatedguesswork.org/posts/apple-csam-more/"/>
		<updated>2021-08-16T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/apple-csam-more/</id>
		<content type="html">&lt;p&gt;Apple has released &lt;a href=&quot;https://www.apple.com/child-safety/pdf/Security_Threat_Model_Review_of_Apple_Child_Safety_Features.pdf&quot;&gt;more
information&lt;/a&gt;
about their client-side
CSAM scanning &lt;a href=&quot;https://www.apple.com/child-safety/&quot;&gt;function&lt;/a&gt;
(See my &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/&quot;&gt;original writeup&lt;/a&gt;).
Though none of this fundamentally changes the situation -- and it&#39;s
not clear why they didn&#39;t just share these details before -- it&#39;s
worth going through them and the points they&#39;ve
been making.&lt;/p&gt;
&lt;h2 id=&quot;scanning-threshold%2Ffalse-positive-rate&quot;&gt;Scanning Threshold/False Positive Rate &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#scanning-threshold%2Ffalse-positive-rate&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Starting small, Apple has published their proposed detection threshold,
30 CSAM images. This is computed by taking a conservatively estimated
10&lt;sup&gt;-6&lt;/sup&gt; false positive rate (their measured rate is 3 in 100 million)
and then conservatively assuming an image library bigger than the biggest of
any current iCloud user and then solving for an overall false positive
rate of 10&lt;sup&gt;-12&lt;/sup&gt;.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;This seems like a fairly reasonable procedure for the non-adversarial
case. Of course, it doesn&#39;t work at all for the adversarial case,
for instance where an attacker knows the hash for a CSAM image
and then creates an innocuous image that has the same hash.
This could happen in at least two ways: first, the database itself
could leak in some way. Second, the attacker could know that a particular
piece of CSAM is in the database and then compute its hash directly.
Either form of attack requires the attacker to know the NeuralHash
algorithm, which Apple hasn&#39;t disclosed, but they might be able to
get that by reverse engineering the binary (in fact, Apple&#39;s verifiability
claims depend on this, as described below.)&lt;/p&gt;
&lt;h2 id=&quot;apple&#39;s-review&quot;&gt;Apple&#39;s Review &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#apple&#39;s-review&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Apple also published more details on their review process which
seems to involve two steps: (1) checking a second hash before human review in order
to minimize the chance of humans reviewing false positives and
then (2) human review of the &amp;quot;visual derivative&amp;quot;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Once Apple&#39;s iCloud Photos servers decrypt a set of positive match
vouchers for an account that exceeded the match threshold, the visual
derivatives of the positively matching images are referred for review
by Apple. First, as an additional safeguard, the visual derivatives
themselves are matched to the known CSAM database by a second,
independent perceptual hash. This independent hash is chosen to reject
the unlikely possibility that the match threshold was exceeded due to
non-CSAM images that were adversarially perturbed to cause false
NeuralHash matches against the on-device encrypted CSAM database. If
the CSAM finding is confirmed by this independent hash, the visual
derivatives are provided to Apple human reviewers for final
confirmation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Several points are worth making here. First, to make this work the visual derivative
needs to be something that a person can look at and compare to
the real image. Apple hasn&#39;t been super-clear about what the
&amp;quot;visual derivative&amp;quot; is but they say a
&amp;quot;visual derivative of the image, such as a low-resolution version&amp;quot;,
which is consistent with what one would expect.
Second, in order for the second hash to be a useful countermeasure,
the second perceptual hash needs not just to be independent
but also secret. Otherwise, an attacker might be able to
create an image which matched both hashes. Of course, because
the second hash isn&#39;t run on people&#39;s phones but rather on
Apple&#39;s (and probably the child safety organizations&#39;s) servers,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
it&#39;s less vulnerable to attack. And if it is compromised,
Apple can change it and have the child safety organizations
recompute the hashes without changing anyone&#39;s phone software.&lt;/p&gt;
&lt;h2 id=&quot;multiple-jurisdictions&quot;&gt;Multiple Jurisdictions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#multiple-jurisdictions&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There have been a number of concerns that Apple would be forced to include
non-CSAM content (&lt;a href=&quot;https://cdt.org/insights/what-could-go-wrong-apples-misguided-plans-to-gut-end-to-end-encryption/&quot;&gt;CDT&lt;/a&gt;, &lt;a href=&quot;https://www.eff.org/deeplinks/2021/08/apples-plan-think-different-about-encryption-opens-backdoor-your-private-life&quot;&gt;EFF&lt;/a&gt;). In response to these, Apple proposes to only include
hashes which are provided by at least two separate child safety
organizations in different jurisdictions:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The first protection against mis-inclusion is technical: Apple
generates the on-device perceptual CSAM hash database through an
intersection of hashes provided by at least two child safety
organizations operating in separate sovereign jurisdictions – that is,
not under the control of the same government. Any perceptual hashes
appearing in only one participating child safety organization’s
database, or only in databases from multiple agencies in a single
sovereign jurisdiction, are discarded by this process, and not
included in the encrypted CSAM database that Apple includes in the
operating system. This mechanism meets our source image correctness
requirement.&lt;/p&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;p&gt;This approach enables third-party technical audits: an auditor can
confirm that for any given root hash of the encrypted CSAM database in
the Knowledge Base article or on a device, the database was generated
only from an intersection of hashes from participating child safety
organizations, with no additions, removals, or changes. Facilitating
the audit does not require the child safety organization to provide
any sensitive information like raw hashes or the source images used to
generate the hashes – they must provide only a non-sensitive
attestation of the full database that they sent to Apple. Then, in a
secure on-campus environment, Apple can provide technical proof to the
auditor that the intersection and blinding were performed correctly. A
participating child safety organization can decide to perform the
audit as well.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I don&#39;t doubt that this is technically possible. In fact, I&#39;m a little
surprised that the proof of correctness has to be done on Apple&#39;s
campus rather than having a zero-knowledge proof that anyone can
verify (maybe that&#39;s coming?). In any case, I&#39;m not sure how comforting
it really should be to people that Apple requires inputs from
child safety organizations from different countries: it&#39;s
not like two governments couldn&#39;t collude to put each other&#39;s
non-CSAM images into their databases either on a one-off basis
or as part of some kind of more formalized arrangement such
as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Five_Eyes&amp;amp;oldid=1032062994&quot;&gt;Five Eyes&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In any case, it would be good to know which other child safety
organization Apple is using to construct their initial database;
presumably it&#39;s NCMEC in the US, but who outside the US?&lt;/p&gt;
&lt;h2 id=&quot;icloud-only&quot;&gt;iCloud-Only &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#icloud-only&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One natural question is whether this is limited to iCloud. Apple
has been pretty dismissive of this question. Here&#39;s Craig Federighi
&lt;a href=&quot;https://www.wsj.com/articles/apple-executive-defends-tools-to-fight-child-porn-acknowledges-privacy-backlash-11628859600&quot;&gt;talking&lt;/a&gt; to WSJ&#39;s Joanna Stern:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I think that&#39;s a common but really profound misunderstanding. This is
only being applied as part of the process of storing something in the
cloud. This isn&#39;t some processing that&#39;s running over the images you
store in your messages or in Telegram or anything else... you know
what you&#39;re browsing on the Web. This literally is part of the
pipeline for storing images in iCloud.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This seems to me like the wrong standard. As I
&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#can-apple-read-other-images-on-my-device%3F&quot;&gt;mentioned&lt;/a&gt; in my
original post, this system could readily be technically applied to
images other than those in iCloud. The major difference is that with iCloud, Apple actually has a
copy of the original image. However, based on the description
they have provided, they don&#39;t need the original image because they
review the visual derivative. If Apple wanted to (for instance)
scan every image in Photos rather than just the ones that were
uploaded to iCloud, this seems like it ought to be pretty
straightforward.&lt;/p&gt;
&lt;p&gt;A more interesting question is whether they could scan images in
third party programs. I&#39;d initially thought this would be fairly
challenging because they would have to scrape pixels off the
screen, but then I realized that Apple provides image rendering
APIs such as &lt;a href=&quot;https://developer.apple.com/documentation/coreimage&quot;&gt;CoreImage&lt;/a&gt;
and &lt;a href=&quot;https://developer.apple.com/documentation/uikit/uiimage&quot;&gt;UIImage&lt;/a&gt;.
Presumably lots of implementors use these, so in principle Apple
could modify them to upload a voucher each time an image is displayed.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;.
This would obviously cost some bandwidth, but isn&#39;t necessarily
prohibitive. The situation is actually even easier for Web browsing
on iOS:
because Apple prohibits the use of any Web engine other than their
own, they already have access to any image which is being rendered.&lt;/p&gt;
&lt;p&gt;So, I don&#39;t really think this is that profound a misunderstanding.
While it&#39;s certainly true that Apple would need to rearchitect their
system some in order to scan non-iCloud images, there doesn&#39;t seem
like any in principle reason why they couldn&#39;t do so, just because
that&#39;s not how the system is currently built; they&#39;ve already done the
hard part.&lt;/p&gt;
&lt;h2 id=&quot;list-verification&quot;&gt;List Verification &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#list-verification&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finally, there&#39;s the question of verifying the lists. Apple writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Since no remote updates of the database are possible, and since Apple
distributes the same signed operating system image to all users
worldwide, it is not possible – inadvertently or through coercion –
for Apple to provide targeted users with a different CSAM
database. This meets our database update transparency and database
universality requirements.&lt;/p&gt;
&lt;p&gt;Apple will publish a Knowledge Base article containing a root hash of
the encrypted CSAM hash database included with each version of every
Apple operating system that supports the feature. Additionally, users
will be able to inspect the root hash of the encrypted database
present on their device, and compare it to the expected root hash in
the Knowledge Base article. That the calculation of the root hash
shown to the user in Settings is accurate is subject to code
inspection by security researchers like all other iOS device-side
security claims.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The general reasoning here is sound:
(1) Auditors verify that the database is correctly constructed.
(2) Apple commits publicly to the
hash of the database&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; and so users
can verify that they have the same hash as the auditors looked
at which is the same as everyone else&#39;s.
(3) Researchers can verify that Apple&#39;s hash computation code
is accurate. However, in practice I don&#39;t think this provides
that high a level of assurance.&lt;/p&gt;
&lt;p&gt;First, as I said above, the database construction procedure --
including the auditing -- doesn&#39;t necessarily guarantee that there are
no non-CSAM images in the database, just that child safety organizations
in two countries are willing to put a given image in. Second, everybody having the same
database doesn&#39;t actually guarantee that there aren&#39;t country-specific
entries in the database. Apple could just put hashes for every
country&#39;s images into the database and then sort things out on the
server side. At minimum, they could just server-side filter the vouchers based on
matching the independent perceptual hash (see above) against a
country-specific database, but there might also be a way to arrange that
there is a separate voucher decryption key for each country so that
only the vouchers for a given country decrypt.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This brings us to the question of calculating the root hash for
the database on a given device. The problem here is that you&#39;re
trusting the phone to tell you the hash of the database. Apple&#39;s
response to this is that security researchers are able to
check that the hash computation code is correct, but that
just tells you that the code they reviewed is correct, not
that the code on an individual phone is correct. In order to
verify that you need to examine the individual phone, not
just look at the code Apple is distributing. Importantly,
you can&#39;t trust what the phone tells you about what code is
running on it because the phone itself could be compromised.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
It&#39;s not just a matter of verifying that the database is correct
but also that the NeuralHash algorithm behaves as expected.
For instance, it could read different parts of the database
depending on which geography the device was in. At the
end of the day you need to be able to study and verify
the whole system.&lt;/p&gt;
&lt;p&gt;Finally, all of this depends on researchers being able to inspect
iOS code, but of course most of the code in iOS isn&#39;t open source
so you have to reverse engineer it and Apple isn&#39;t always
&lt;a href=&quot;https://www.wired.com/story/apple-platform-security-guide-researchers/&quot;&gt;that forthcoming&lt;/a&gt;
with details of how things work (to just take an example from
this case, they haven&#39;t published the details of NeuralHash,
even though, as noted above, that&#39;s required to verify that
the system behaves as claimed).
Moreover, Apple historically hasn&#39;t been &lt;a href=&quot;https://www.cpomagazine.com/cyber-security/in-a-major-victory-for-security-researchers-federal-court-rules-that-virtual-ios-devices-are-not-a-copyright-violation/&quot;&gt;that enthusiastic&lt;/a&gt; about security researchers studying the iOS software.&lt;/p&gt;
&lt;p&gt;I understand Apple&#39;s desire to assert that the whole system
is independently verifiable, but I think that&#39;s a bit of
a category error here. At the end of the day, neither Apple hardware nor Apple
software is an open system and if you&#39;re going to buy an Apple device
you&#39;re at some level trusting Apple with your data.
Obviously it would be better if that weren&#39;t the case, but
as long as it is, it&#39;s not clear to me how useful it
is to have just this piece be verifiable.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;As a side note, this allows us to solve for
the expected size of the library, but I&#39;m too lazy to do it. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Apple never gets the images, so the child safety organization already computes the NeuralHash
values for the images. They would need to either compute the
visual derivative and send it to Apple or compute the visual
derivative and then the independent hash and send it to Apple. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Probably with some kind of local cache to prevent multiple
uploads or maybe a prefilter to remove anything that clearly
isn&#39;t CSAM. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note that this is a different kind of hash
than the NeuralHash, and detects any change. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s obviously the case you could do this without the
independent auditing stage that apple proposes, just
by using a per-country blinding key. I&#39;m not sure if
it&#39;s possible with the auditing, however. I suspect
it depends on the details of how that is done. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Actually verifying the software running on a given device
is quite a challenging problem because you need some way
to examine the code on the device which isn&#39;t mediated by
by that same code. For instance, you could read the data
off the disk (these days a flash drive) but the disk itself
isn&#39;t just dumb storage, it&#39;s got a processor in it that
controls reading off the disk and runs the interface to the
computer, and the device might be able to rewrite
the firmware on that processor. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-more/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Overview of Apple&#39;s Client-side CSAM Scanning</title>
		<link href="https://educatedguesswork.org/posts/apple-csam-intro/"/>
		<updated>2021-08-09T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/apple-csam-intro/</id>
		<content type="html">&lt;p&gt;Last week Apple announced a new &lt;a href=&quot;https://www.apple.com/child-safety/&quot;&gt;function&lt;/a&gt; in iOS that will scan photos
in order to detect images containing &lt;em&gt;Child Sexual Abuse Material&lt;/em&gt; (CSAM).
This post attempts to provide an overview of the functionality Apple
has built and answer some questions about what it can and cannot do.&lt;/p&gt;
&lt;h2 id=&quot;overview&quot;&gt;Overview &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#overview&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The basic idea behind the system is to detect images on the device that
match &lt;em&gt;known&lt;/em&gt; CSAM images. What this means is that Apple has a
database of hashes of CSAM images
[Update: this originally said images, but actually Apple only
needs the hashes and their writeup suggests they don&#39;t have
the images.]
provided by the National Center for Missing
and Exploited Children (NCMEC, pronounced &amp;quot;nick-meck&amp;quot;) and is trying
to detect whether the images on the device are in that
database. Although the system uses machine learning (ML), it is
being used to account for small changes in the images (e.g., were they
cropped, or compressed slightly differently, etc.) not to generically
detect whether an unknown image is CSAM. If, for instance, someone
sends or receives a CSAM image that is not already known to NCMEC,
then this system will not detect it.&lt;/p&gt;
&lt;p&gt;Although the scanning happens on the device, it is currently limited
to photos which are being uploaded to iCloud. This is actually
a little puzzling because, as EFF &lt;a href=&quot;https://www.eff.org/deeplinks/2021/08/apples-plan-think-different-about-encryption-opens-backdoor-your-private-life&quot;&gt;notes&lt;/a&gt;,
photos on iCloud are &lt;a href=&quot;https://support.apple.com/en-us/HT202303&quot;&gt;not end-to-end encrypted&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and therefore Apple could scan the photos on the
server side, though it apparently does not do so. One possible
&lt;a href=&quot;https://twitter.com/alexstamos/status/1424054578438307840&quot;&gt;theory&lt;/a&gt;
here is that Apple is intending to introduce end-to-end encryption
of iCloud data and wants to have an answer to how they are
going to address CSAM for that data. It&#39;s important to realize
that there&#39;s nothing in the system that prevents Apple from
scanning photos that never leave the device; they&#39;ve just
chosen not to do so (see &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#what-happens-if-people-disable-icloud%3F&quot;&gt;below&lt;/a&gt; for details). However you don&#39;t have to be &lt;em&gt;sharing&lt;/em&gt; photos with anyone;
just backing up photos to iCloud is enough to initiate the scanning
process.&lt;/p&gt;
&lt;h2 id=&quot;design-objectives&quot;&gt;Design Objectives &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#design-objectives&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Here is what Apple describes as the privacy and security
guarantees of the system (I&#39;ve added some numbers to make it easier to follow).&lt;/p&gt;
&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;Apple does not learn anything about images that do not match the known CSAM database.&lt;/li&gt;
&lt;li&gt;Apple can’t access metadata or visual derivatives for matched CSAM images until a threshold of matches is exceeded for an iCloud Photos account.&lt;/li&gt;
&lt;li&gt;The risk of the system incorrectly flagging an account is extremely low. In addition, Apple manually reviews all reports made to NCMEC to ensure reporting accuracy.&lt;/li&gt;
&lt;li&gt;Users can’t access or view the database of known CSAM images.&lt;/li&gt;
&lt;li&gt;Users can’t identify which images were flagged as CSAM by the system.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;p&gt;These all seem pretty obvious, especially (1) and (3): nobody wants
Apple learning about all the images on your phone or to
get inaccurately accused of having CSAM.&lt;/p&gt;
&lt;p&gt;(4) and (5) deserve a closer look. Obviously Apple doesn&#39;t want to actually
send a database of CSAM images to the client, but the database actually
contains image hashes which probably don&#39;t
really let you reconstruct the image.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
Rather, the main reason for
(4) and (5) is to prevent people from learning which hashes are in the
database because then they could avoid sharing those images or potentially
perturb the images until they had a different hash. It would also allow
an attacker to cause trouble for others by sending them innocuous images
that match the hash, thus causing false positives that get them
investigated.&lt;/p&gt;
&lt;h2 id=&quot;system-description&quot;&gt;System Description &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#system-description&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The system uses some quite &lt;a href=&quot;https://www.apple.com/child-safety/pdf/Apple_PSI_System_Security_Protocol_and_Analysis.pdf&quot;&gt;fancy cryptography&lt;/a&gt;
but I&#39;ll attempt to provide an overview that doesn&#39;t require that much
cryptographic knowledge. As a disclaimer, I&#39;ve read the paper and
think I mostly understand it, but I haven&#39;t studied the proofs
and even though it was designed by some well-known people,
the system was just released and thus hasn&#39;t been widely analyzed,
so there&#39;s of course some chance there&#39;s a mistake.&lt;/p&gt;
&lt;p&gt;At a high level, the system works as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Apple builds an encrypted database of the hashes for each image and sends it to each device.&lt;/li&gt;
&lt;li&gt;The device hashes each image and uses the encrypted database to generate a &amp;quot;voucher&amp;quot; which gets sent to Apple. At this point, the device does not know which images matched.&lt;/li&gt;
&lt;li&gt;On the server side, Apple decrypts the vouchers, but is only able to do so for the matching images (hashes). The decrypted vouchers have another layer of encryption so aren&#39;t useful just yet.&lt;/li&gt;
&lt;li&gt;Once Apple has decrypted enough vouchers for a given device, they are able to put them together and remove the inner layer of encryption. This allows them to determine which images actually matched.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;the-image-database&quot;&gt;The Image Database &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#the-image-database&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The input to the system is a labeled database of images which
are to be detected. Although the system is described as scanning
for CSAM, from the perspective of the system they&#39;re just images.
For instance, if Apple wanted to detect everyone who had made
a copy of Beeple&#39;s &lt;a href=&quot;https://cdn.vox-cdn.com/thumbor/ff0-Hpbfu6PV8wsdP509CL8DS_U=/0x0:3000x3000/920x613/filters:focal(1260x1260:1740x1740):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/68948366/2021_NYR_20447_0001_001_beeple_everydays_the_first_5000_days034733_.0.jpg&quot;&gt;Everydays&lt;/a&gt;
(enforcing his &lt;a href=&quot;https://www.theverge.com/2021/3/11/22325054/beeple-christies-nft-sale-cost-everydays-69-million&quot;&gt;NFT&lt;/a&gt;!),
they could just insert that into the database.&lt;/p&gt;
&lt;p&gt;The database is then processed with a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=Perceptual_hashing&amp;amp;id=1032723944&amp;amp;wpFormIdentifier=titleform&quot;&gt;perceptual hashing&amp;quot; system&lt;/a&gt; called NeuralHash.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
NeuralHash takes an image
and produces a short value (I believe on the order of 256 bits)
which is characteristic of the image. The idea is supposed to be that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If two images look &amp;quot;the same&amp;quot; then they will have the same
hash, even if they are slightly different. For instance,
Apple gives the example of a color and black-and-white version
of the same image.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If two images are &amp;quot;different&amp;quot; then they will have different
hashes with very high probability.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The rest of the system is all based on these hashes and is designed
to detect if the images on a given device have a hash that is in the
database. Importantly, if two images do -- due to bad luck or attack --
happen to have the same hash, then this behaves as if the images
were the same.&lt;/p&gt;
&lt;p&gt;Each image is run through NeuralHash to produce its corresponding
hash value. Apple then
takes each of those hash values and &lt;em&gt;blinds&lt;/em&gt; it using a secret
key known only to Apple, producing a blinded hash. These values are
then stored at a table in a deterministic location in the table
derived from the original hash. The figure below shows a trivial version of this
in which we just use the last digit of the hash as the position in
the table (remember, computer science people count from 0).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/csam-table.png&quot; alt=&quot;Example of hashing process&quot; /&gt;&lt;/p&gt;
&lt;p&gt;There are two subtleties in building the table. First, two hash values
might happen to correspond to the same position in the table (in
the example above, they might have the same last digit). This isn&#39;t
that likely and can be dealt with in a number of ways that are outside
the scope of this article: the easiest is just to make the table
somewhat larger than the total number of hashes (so this doesn&#39;t
happen often) and keep only the first or last matching hash (thus
tolerating a small number of images not getting reported).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Second, not every entry in the table will be filled with a value:
if we just sent an empty value, this would tell the client some
of the hashes that were &amp;quot;safe&amp;quot; (i.e., had no corresponding images).
Instead, Apple just fills these with random values (shown in
red in the diagram above).&lt;/p&gt;
&lt;p&gt;Once the table is built, Apple then sends a copy of the database to
the device. Note that although each device has a copy of the
database, it is unable to determine the set of hashes because it
does not have the blinding key (objective #4)&lt;/p&gt;
&lt;h3 id=&quot;the-scanning-process&quot;&gt;The Scanning Process &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#the-scanning-process&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In the scanning process, the device goes through its images one at
a time. It computes a hash for each image and then looks up
the corresponding table entry in the database at the
position indicated by the hash (again, in our example, using the last
digit of the hash).
As noted above, there are three possibilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The images are the same and so have the same hash&lt;/li&gt;
&lt;li&gt;The images are different and so have different hashes, but are
just at the same position in the table (in my example, because
they have the same last digit).&lt;/li&gt;
&lt;li&gt;There is no image at this position in the database, in which
case the value in the table is just random.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Importantly, the device can&#39;t distinguish these cases (objective #2) so it does the same thing in any case: It uses its own hash and the blinded database
entry to compute an encryption key and a value &lt;em&gt;Q&lt;/em&gt; (which we&#39;ll
use in a moment).&lt;/p&gt;
&lt;img src=&quot;https://educatedguesswork.org/img/csam-scan.png&quot; width=&quot;500/&quot; /&gt;
&lt;p&gt;It then creates a voucher which contains the pair of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Q&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Some &amp;quot;associated data&amp;quot; encrypted with the encryption key (We&#39;ll get to what
that means shortly.) but &lt;em&gt;not&lt;/em&gt; the key itself.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It then sends the voucher up to the server along
with the image (again, the server could have computed all this stuff
itself from the image, but if the image was end-to-end
encrypted then that would not be possible.)&lt;/p&gt;
&lt;h3 id=&quot;server-side-processing&quot;&gt;Server-Side Processing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#server-side-processing&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;When the server receives the voucher, it combines its blinding key
with &lt;em&gt;Q&lt;/em&gt; to compute its own encryption key and uses that key to decrypt
the associated data. If the images were the same (and hence
have the same hash) then the keys that the client and the server
computed will be the same and the associated data will decrypt correctly. If the
hashes are not the same or the database contained a random
value at this position, then the key the server computes will be a random value
and the associated data will not be decryptable.&lt;/p&gt;
&lt;p&gt;The result here is that the server immediately learns which
vouchers correspond to matching images and which do not,
but learns nothing additional about non-matching images (objective #1).
Moreover, it learns the &amp;quot;associated data&amp;quot; for those images.
This would allow the server to immediately determine how
many matching images the device has. In order to prevent
this, the device also sends some &amp;quot;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#synthetic-match-vouchers&quot;&gt;synthetic match vouchers&lt;/a&gt;&amp;quot;
which always decrypt. I&#39;ll describe this feature below.&lt;/p&gt;
&lt;h3 id=&quot;threshold-decryption&quot;&gt;Threshold Decryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#threshold-decryption&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One of the design goals of the system is that the server not learn
the details of single image matches (objective #2). Instead, the server only gets to
learn which images matched once it has seen a certain number
of matches. This is known as a threshold system.&lt;/p&gt;
&lt;p&gt;In order to this, Apple makes use of a technique called
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Shamir%27s_Secret_Sharing&amp;amp;oldid=1035454802&quot;&gt;Shamir secret sharing&lt;/a&gt;.
Briefly, a secret sharing scheme is a system in which you take
some secret &lt;em&gt;S&lt;/em&gt; and break it up into a bunch of &lt;em&gt;shared&lt;/em&gt; etc.
in such a way that you need a minimum number of shares (the threshold, &lt;em&gt;t&lt;/em&gt;)
in order to reconstruct &lt;em&gt;S&lt;/em&gt;. You can have an unlimited
number of shares but any subset of size &lt;em&gt;t&lt;/em&gt; will let you reconstruct
the secret (this is called M out of N sharing.)&lt;/p&gt;
&lt;p&gt;The associated data consists of two values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;An encrypted value that contains information about the
image: the hash and a &amp;quot;visual derivative&amp;quot; (more on
this later). This value is encrypted with device-specific
key that is used for each encrypted value.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A single share for that device-specific key.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The net result is that the server collects all the vouchers
and discards all the undecryptable ones. It waits until it
has enough shares for a given device, then reconstructs the device-specific key.
It can then decrypt
the encrypted values for &lt;em&gt;every&lt;/em&gt; matching voucher it has received,
as well as for any voucher it receives in the future. However,
until it has received &lt;em&gt;t&lt;/em&gt; shares, it is not able to decrypt any
of these encrypted values.&lt;/p&gt;
&lt;h3 id=&quot;synthetic-match-vouchers&quot;&gt;Synthetic Match Vouchers &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#synthetic-match-vouchers&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Even with threshold decryption, the server still learns the total
number of single image matches. In order to prevent this, each
client replaces some of its vouchers with &amp;quot;synthetic match vouchers.
These vouchers are designed to decrypt properly but contain
an invalid secret share value. Due to some crypto magic
I won&#39;t get into, it is not possible for the server to distinguish
synthetic match vouchers from real vouchers until it already has
a threshold number of real vouchers. The result is that the
server cannot precisely determine the number of actual matches before
the threshold is reached.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This is not the same as the server having no information about the
number of matches. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The server knows that there are no more matches than the total
number of decryptable vouchers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the server knows the rate at which synthetic vouchers are
introduced, it can estimate the total number of actual matches
for a given user using &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Bayes%27_theorem&amp;amp;oldid=1036931539&quot;&gt;Bayes&#39;s Theorem&lt;/a&gt;. This also allows the server to get a fairly precise estimate
of the overall rate of CSAM (see: &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Randomized_response&amp;amp;oldid=1024956231&quot;&gt;Randomized Response Technique&lt;/a&gt;) even for
sub-threshold clients.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;handling-matches&quot;&gt;Handling Matches &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#handling-matches&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Once Apple has collected enough vouchers to cross the threshold
and decrypted the data, they get two pieces of information for
each image:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The hash&lt;/li&gt;
&lt;li&gt;A &amp;quot;visual derivative&amp;quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Apple is a bit unclear on what happens next. Here&#39;s what their
&lt;a href=&quot;https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf&quot;&gt;white paper&lt;/a&gt; says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The threshold is selected to provide an extremely low (1 in 1
trillion) probability of incorrectly flagging a given account. This
is further mitigated by a manual review process wherein Apple
reviews each report to confirm there is a match, disables the user’s
account, and sends a report to NCMEC. If a user feels their account
has been mistakenly flagged they can file an appeal to have their
account reinstated.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We don&#39;t know what this manual review consists of, but there
are a number of possibilities.&lt;/p&gt;
&lt;p&gt;First, they could just look to see if the reported hashes to see if
they really match hashes in the database. This is just a mechanical
check in case there is some sort of bug in the system and you
wouldn&#39;t expect to find much here. The main reason you would want
a manual review is to see if you had just by chance an innocuous
image had gotten a hash value which matches a piece of CSAM
(i.e., a false positive) but this check won&#39;t detect that.&lt;/p&gt;
&lt;p&gt;Second, they could check the image itself to see if it (1) it
looks like the corresponding image in the database or (2) if
it looks like CSAM. As noted above, this would only be possible
because the images are being uploaded to iCloud and if they
are not end-to-end encrypted In a system
where the images just stayed on the client, this would obviously
not be possible.&lt;/p&gt;
&lt;p&gt;Finally, they could use the &amp;quot;visual derivative&amp;quot;. I can&#39;t find
a description of what this is (Apple: Call me!), so I&#39;m just speculating, but
one possibility is that it&#39;s some kind of thumbnail of the
image that would allow you to see what the contents were without
having to see the whole image. If so, then the Apple reviewers
could look at the visual derivative to see if it was as expected,
even if they whole image hadn&#39;t been uploaded.&lt;/p&gt;
&lt;h2 id=&quot;frequently-asked-questions&quot;&gt;Frequently Asked Questions &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#frequently-asked-questions&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id=&quot;can&#39;t-a-device-just-lie%3F&quot;&gt;Can&#39;t a device just lie? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#can&#39;t-a-device-just-lie%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Yes. The threat model here is a bit odd because usually
we &lt;a href=&quot;https://tools.ietf.org/rfcmarkup?doc=3552#section-3&quot;&gt;assume&lt;/a&gt;
endpoints are uncompromised, but in this case uncompromised
is kind of an ambiguous concept. In order for the system to work,
the device has to execute the protocol honestly, but
that&#39;s not necessarily what users want: presumably people who
are downloading CSAM images don&#39;t want a visit &lt;s&gt;from NCMEC, let
alone&lt;/s&gt; the police.[Update: NCMEC isn&#39;t a law enforcement agency, so
they&#39;re probably not going to pay people a visit.] So, a basic assumption here is that even
though the device is in the user&#39;s hands, it&#39;s actually doing
what Apple wants. This is made possible by the fact that
Apple controls what software is able to run on their
devices and unless you &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Jailbreaking_(iOS)&amp;amp;oldid=1037448828&quot;&gt;jailbreak&lt;/a&gt;
the device you can&#39;t change those behaviors.&lt;/p&gt;
&lt;h3 id=&quot;can-apple-read-other-images-on-my-device%3F&quot;&gt;Can Apple read other images on my device? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#can-apple-read-other-images-on-my-device%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Sort of? The basic design of the system is that only images
which have matching hashes get reported. Assuming that the hashing
algorithm is operating as designed, then the hash value should
be more or less evenly distributed across the output range.
So Apple shouldn&#39;t be able to send you a database that will
just let them read all your images. However, they can
certainly send a database that has non-CSAM images, as well
as fill in the empty rows in the table with real values
and just hope to get lucky. Presumably Apple has some
policy controls to prevent this, but that&#39;s not something
that is technically enforced or that is readily publicly
verifiable.&lt;/p&gt;
&lt;p&gt;But again, this is just assuming the threat model in which
the device is uncompromised (in this case from the user&#39;s
perspective).
In practice, Apple can just change the code on
your device and make it do anything they want, including
uploading copies of all your images whether you have iCloud
on or not. Note that if iOS were
an open source system (along with methods for people to verify
that the code that was on the system was really built from
the published source) then this threat would be significantly
diminished. However, even though pieces of iOS are open source,
the system as a whole is not.&lt;/p&gt;
&lt;h3 id=&quot;can&#39;t-people-change-images-to-evade-this-system%3F&quot;&gt;Can&#39;t people change images to evade this system? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#can&#39;t-people-change-images-to-evade-this-system%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Probably? Because of property #4 people don&#39;t know whether a given
image hash is in the database, but the hash algorithm is known
(well, sort of: it&#39;s not been published but someone could reverse
engineer it out of the code), so it would be possible to take
an image and just change it enough so that the hash changes.
This would probably help evade the system, though of course
Apple could also seed hashes for perturbed images or adjust
the algorithm to make it insensitive to these particular
perturbations.&lt;/p&gt;
&lt;h3 id=&quot;what-about-end-to-end-encryption%3F&quot;&gt;What about end-to-end encryption? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#what-about-end-to-end-encryption%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If Apple were to change iOS to do end-to-end encryption for
photos, this would make things more complicated. They&#39;d
still learn about hashes but the process of manual review
would become harder. It&#39;s
possible they could try to use machine learning techniques
to &lt;a href=&quot;https://towardsdatascience.com/black-box-attacks-on-perceptual-image-hashes-with-gans-cc1be11f277&quot;&gt;reverse&lt;/a&gt; the hash, but given that the question
is precisely whether the hash is a false positive
(i.e., matches an innocuous image)
that&#39;s not that useful; you already know that it matches
some known image.
If the
&amp;quot;visual derivatives&amp;quot; are thumbnails or the like, then it
probably wouldn&#39;t make much of a difference because Apple
could still review them. If they&#39;re not, then Apple would
probably need to change the &amp;quot;additional data&amp;quot; to include
the encryption keys for the images, in which case they
could decrypt the image and review it directly.&lt;/p&gt;
&lt;h3 id=&quot;what-happens-if-people-disable-icloud%3F&quot;&gt;What happens if people disable iCloud? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#what-happens-if-people-disable-icloud%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;For now, this means that their images don&#39;t get scanned, but
Apple could change that in the future. However, in the system
described above, they then wouldn&#39;t have a copy of the image
at all, even an encrypted one, so this complicates the review
process. Again, if the &amp;quot;visual derivative&amp;quot;
includes a thumbnail, then things probably still work. But if
&lt;em&gt;not&lt;/em&gt; then they would presumably need to change the system
to upload a thumbnail or the image itself.&lt;/p&gt;
&lt;h3 id=&quot;what-about-quantum-computers%3F&quot;&gt;What about Quantum Computers? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#what-about-quantum-computers%3F&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Readers of my &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/&quot;&gt;post&lt;/a&gt; on quantum computers
might wonder what the impact of quantum computers is on this system.
I&#39;m not entirely sure, but I suspect that it would allow anyone
to be able to extract Apple&#39;s blinding key and hence the original database.
It would probably also allow someone -- and especially Apple -- to decrypt every voucher,
not just matching ones. Neither of these seems great, but given
that the original data was probably protected with a vulnerable
algorithm, it&#39;s not clear exactly how much worse this would be in practice.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The table at this link indicates that photos are encrypted, but
it seems likely that this means just that they&#39;re encrypted
with keys known to Apple, which might protect you from
external attack, but not from Apple. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It seems like it&#39;s possible to &lt;a href=&quot;https://towardsdatascience.com/black-box-attacks-on-perceptual-image-hashes-with-gans-cc1be11f277&quot;&gt;synthesize&lt;/a&gt; something that might look
vaguely like the original image from a perceptual hash, but the
results probably are never going to be that accurate. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The ML in the system comes in in the training of the NeuralHash
algorithm. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The actual protocol seems to use a variant of
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Cuckoo_hashing&amp;amp;oldid=1028050593&quot;&gt;Cuckoo Hashing&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Assuming I understand the situation correctly, synthetic
matches also make the system slightly less sensitive
because there is some chance that a synthetic match will
overwrite a real match (recall that clients have no
information about whether a match is real or not).
However, unless the rate of synthetic matches is set very
high, this shouldn&#39;t have much of an impact, perhaps
effectively moving the (already arbitrary) threshold up by a match or two. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
If the clients randomize the frequency at which they generate
synthetic matches, then this will significantly decrease
the information the server learns about an individual client,
while still allowing the server to estimate the overall
match rate. Thanks to Kevin Dick for this observation. &lt;a href=&quot;https://educatedguesswork.org/posts/apple-csam-intro/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Securing Cryptographic Protocols Against Quantum Computers</title>
		<link href="https://educatedguesswork.org/posts/pq-security/"/>
		<updated>2021-08-06T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/pq-security/</id>
		<content type="html">&lt;p&gt;The security of the Internet depends critically on cryptography.
Whenever you log into Facebook or Gmail or buy something on Amazon,
you&#39;re counting on cryptography to protect you and your data.
Unfortunately for cryptography, there&#39;s currently a lot of work
on developing &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Quantum_computing&amp;amp;oldid=1036228251&quot;&gt;quantum computers&lt;/a&gt;,
which have the potential to break a lot of the cryptographic
algorithms that we use to secure our data. It&#39;s far from
clear if and when there will ever be workable quantum
computers (see this
&lt;a href=&quot;https://youtu.be/abmd1n5WUvc?t=1445&quot;&gt;talk&lt;/a&gt;
by cryptographer &lt;a href=&quot;https://inf.ethz.ch/people/person-detail.paterson.html&quot;&gt;Kenny Paterson&lt;/a&gt;, see
&lt;a href=&quot;https://datatracker.ietf.org/meeting/99/materials/slides-99-saag-post-quantum-cryptography&quot;&gt;slides&lt;/a&gt;
and also &lt;a href=&quot;https://haic.fi/wp-content/uploads/2019/11/HAIC-Talk-PQC.pdf&quot;&gt;here&lt;/a&gt;
for background), but if one does get built, the Internet as
we know it is in big trouble.&lt;/p&gt;
&lt;h2 id=&quot;a-(very)-brief-overview-of-quantum-computing-and-cryptography&quot;&gt;A (Very) Brief Overview of Quantum Computing and Cryptography &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-security/#a-(very)-brief-overview-of-quantum-computing-and-cryptography&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In this section, I try to give the very barest overview of
what you need to know about quantum computing and its impact
on cryptography. This will really be inadequate for any
real understanding, but rather is just what you need
to know to follow the rest of this post.&lt;/p&gt;
&lt;p&gt;The first thing to know is that the security of real-world
cryptographic algorithms depends on &lt;em&gt;computational complexity&lt;/em&gt;.
For example,
a typical &amp;quot;symmetric&amp;quot; encryption algorithm such as the
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Advanced_Encryption_Standard&amp;amp;oldid=1031394894&quot;&gt;Advanced Encryption Standard (AES)&lt;/a&gt;
uses a &lt;em&gt;key&lt;/em&gt; to encrypt data. The number of possible keys
is very large (2&lt;sup&gt;128&lt;/sup&gt; for typical uses of AES)
but not infinite, so in principle you could just try
to decrypt the encrypted data with every key in sequence until
you get a result that looks sensible.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This is called &amp;quot;exhaustive search&amp;quot; or sometimes &amp;quot;brute force&amp;quot;.
However, 2&lt;sup&gt;128&lt;/sup&gt; is a very big number and it is not
practical to try all those keys with any normal computer,
even of unreasonable size.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Quantum computers use quantum mechanical techniques to speed
up this process. One way to get some intuition for this is
to think of it like quantum computers let you try a lot of different
keys at once (think &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Quantum_superposition&amp;amp;oldid=1034455656&quot;&gt;superposition&lt;/a&gt;
and &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Schr%C3%B6dinger%27s_cat&amp;amp;oldid=1036080452&quot;&gt;Schrodinger&#39;s Cat&lt;/a&gt;)
and only get the answer for the one that was right. The result
is that a sufficiently powerful quantum computer can do some computations in practical time -- like
breaking an encryption algorithm -- that would otherwise not
be practical. This creates a problem for us in the real-world.
There are some very difficult engineering problems in building
a large quantum computer, but we also don&#39;t know that they
are insurmountable.&lt;/p&gt;
&lt;h2 id=&quot;quantum-computers-and-communications-security-protocols&quot;&gt;Quantum Computers and Communications Security Protocols &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-security/#quantum-computers-and-communications-security-protocols&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Like quantum computing, the design of communications security protocols is also
a very complicated topic, but again here is the barest overview of
what you need to follow along.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
If we look at a typical channel security protocol like &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Transport_Layer_Security&amp;amp;oldid=1036220831&quot;&gt;TLS&lt;/a&gt;,
we can see that there are three major cryptographic functions
being performed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Key Establishment.&lt;/em&gt; The peers negotiate a cryptographic key which
they can use to encrypt data. This key is authenticated in the next
step.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Authentication.&lt;/em&gt; The peers authenticate to each other. In the case of
TLS, this usually means that the server (e.g., Amazon)
proves its identity to the client (i.e., you).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Bulk Encryption.&lt;/em&gt; The peers use the key established in the previous
step to actually protect (encrypt and authenticate) the data they want to send (e.g., your
credit card number). All this data is tied back to the previous
authentication step because only the right peer will have the key.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The reason for this structure is that authentication and key establishment
usually make use of what&#39;s called &amp;quot;asymmetric&amp;quot; or &amp;quot;public key&amp;quot; algorithms,
which allow two people who don&#39;t share a secret to communicate.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;These
algorithms are powerful but slow. Once you have established a key,
you then use &amp;quot;symmetric&amp;quot; algorithms to actually protect the data.
These algorithms are much faster but require you to already share
a key. Most encryption systems, whether e-mail, voice encryption,
or instant messaging share this basic structure, though for
non-interactive use cases like e-mail, with everything bundled
into a single message. Systems that just provide authenticity
(e.g., certificates) obviously don&#39;t have key establishment
or bulk encryption.&lt;/p&gt;
&lt;p&gt;This brings us to quantum computers. The best known quantum computing
algorithms weaken
(see: &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Grover%27s_algorithm&amp;amp;oldid=1027395496&quot;&gt;Grover&#39;s Algorithm&lt;/a&gt;)
the standard symmetric algorithms, which means that you can protect
yourself -- or so it seems -- by doubling the key size,
which is practical. However, they completely break
(see: &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Shor%27s_algorithm&amp;amp;oldid=1034881514&quot;&gt;Shor&#39;s Algorithm&lt;/a&gt;)
all the standard asymmetric algorithms,
more or less at any key size. If the asymmetric
algorithms are broken, then the attacker can just recover the key
you are using and decrypt your traffic (or impersonate the peer)
without attacking the symmetric algorithm at all, which is
obviously catastrophic. In other words, we need some new asymmetric
algorithms which are secure even against quantum computers.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;post-quantum-algorithms-for-protocols&quot;&gt;Post-Quantum Algorithms for Protocols &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-security/#post-quantum-algorithms-for-protocols&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The good news is that there are new &amp;quot;post-quantum&amp;quot; (PQ) cryptographic algorithms
(the current algorithms are usually called &lt;em&gt;classical&lt;/em&gt; algorithms by
analogy to &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Classical_mechanics&amp;amp;oldid=1035418713&quot;&gt;classical mechanics&lt;/a&gt;).
which
are not currently known to be breakable with existing quantum algorithms.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Since 2017 NIST has been running a &lt;a href=&quot;https://csrc.nist.gov/projects/post-quantum-cryptography&quot;&gt;competition&lt;/a&gt;
to select a set of post-quantum algorithms, with a target of picking something
in the 2023-ish time frame. The bad news is that the algorithms are, well, not
that great: typically they either are slower, involve sending/receiving more
data, or both.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt; Anyway, people
have been looking at how to integrate these new algorithms with existing
protocols, though mostly in preparation for when NIST finally declares a
winner, which finally brings me to the point of this post, which is how
one does that.&lt;/p&gt;
&lt;h3 id=&quot;bulk-encryption&quot;&gt;Bulk Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-security/#bulk-encryption&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Obviously, the first thing you want to do is to double the key size of
your symmetric algorithms. That doesn&#39;t require any fancy new crypto,
but you still need to do it. Once that&#39;s done, the situation gets
a bit more complicated.&lt;/p&gt;
&lt;h3 id=&quot;key-establishment&quot;&gt;Key Establishment &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-security/#key-establishment&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In order of priority the next thing to do is to address key establishment.
The reason for this is that even if a quantum computer doesn&#39;t exist today
an attacker can record all of your traffic in the hope that eventually
a quantum computer will exist and they&#39;ll be able to decrypt it. Thus,
it&#39;s helpful to use post-quantum key establishment now.&lt;/p&gt;
&lt;p&gt;Instead of just swapping existing key establishment mechanisms for
PQ mechanisms, what&#39;s actually being proposed is to use them together
in what&#39;s called a &amp;quot;hybrid&amp;quot; mode. This just means that you do both
regular and PQ key establishment and parallel and then mix the results
together. This obviously has worse performance than doing either alone
but the truth is that people aren&#39;t really that confident about the
security of PQ algorithms and so this allows you to get a measure of
security against quantum computers without worrying that the PQ
algorithm will get broken: even if that happens you&#39;ll still have
as much security as you have now.&lt;/p&gt;
&lt;p&gt;This process is farthest along in TLS, where this kind of thing drops
in very easily and there have been &lt;a href=&quot;https://blog.cloudflare.com/the-tls-post-quantum-experiment/&quot;&gt;several&lt;/a&gt;
&lt;a href=&quot;https://security.googleblog.com/2016/07/experimenting-with-post-quantum.html&quot;&gt;trials&lt;/a&gt;
of hybrid algorithms.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
The results are fairly positive: some algorithms have fairly
comparable performance to existing public key algorithms, as
shown in the graphs below:&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://blog.cloudflare.com/content/images/2019/10/Screen-Shot-2019-10-29-at-2.04.13-PM.png&quot; alt=&quot;Cloudflare&#39;s comparisons of post-quantum algorithms&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Other channel security protocols like TLS (SSH, IPsec, etc.) should
be fairly easy to adapt in similar ways, as should secure e-mail
protocols like PGP, S/MIME, etc. The situation for instant messaging
applications is a bit more complicated because the PQ
algorithms aren&#39;t complete drop-in replacements for the existing
algorithms (in particular, they mostly don&#39;t look exactly like Diffie-Hellman),
so it has to be taken on a case-by case basis.&lt;/p&gt;
&lt;h3 id=&quot;authentication&quot;&gt;Authentication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-security/#authentication&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Finally, we need to address authentication. This is lower priority
because for a quantum computer to be useful for attacking authentication, the
attacker needs to have it before the relying party verifies that
authentication. For instance, if we are authenticating a TLS connection
then we are primarily concerned with an attacker who is able to
impersonate the peer at the time the association is set up. As an example,
imagine you are making a TLS connection to Amazon, then the
attacker has to already have a quantum computer so that it can break
Amazon&#39;s key. If it breaks Amazon&#39;s key a week later, that doesn&#39;t
allow it go back in time to impersonate Amazon to you.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;That doesn&#39;t mean that post-quantum authentication isn&#39;t important:
it&#39;s going to take a very long time to roll out and so if a quantum
computer &lt;em&gt;is&lt;/em&gt; developed we&#39;re going to want to have all the post-quantum
credentials pretty much ready to go. Actually doing this turns
out to be a little complicated because you need to operate both
classical and post-quantum, algorithms in parallel. To go back to
our TLS example, suppose a server gets a post-quantum certificate
(for the sake of this discussion, assume it&#39;s both signed with
a post-quantum algorithm &lt;em&gt;and&lt;/em&gt; contains a post-quantum key, because
analyzing the mixed case is more difficult). Not every browser will
accept PQ algorithms, so servers will also need to have a classical
certificate, probably for years to come.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
Then when a client connects to the server, they negotiate
algorithms and the server provides the appropriate certificate.
For non-interactive situations like e-mail, the sender will
want to sign with both certificates in parallel.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that in many cases it doesn&#39;t actually
improve the authenticating party&#39;s security to have a PQ
certificate. The reason is that if the relying party is willing to
accept a broken classical algorithm then the attacker can use their
quantum computer to forge a classical certificate (or, if the
authenticating party has a certificate with a classical algorithm,
forge a signature on the certificate) and impersonate the authenticating party directly.
In order to have protection against a quantum computer, you need
relying parties to refuse to accept the (now) broken algorithms.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;
This means that authenticating parties (e.g., TLS servers) don&#39;t
have a huge incentive to roll out PQ certificates quickly.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The main reason to do so is to enable a quick switch to PQ algorithms
if a quantum computer gets built and then clients rapidly move
to deprecate classical algorithms; even then, you just need to
to be ready to switch quickly enough that the clients won&#39;t
get ahead of you (or be big enough that they can&#39;t afford to).
It&#39;s not clear to me how much that&#39;s going happen here: because
the PQ authentication algorithms aren&#39;t great, there&#39;s not a lot of incentive
to move to them, especially before NIST has identified the
winners. After that happens, we&#39;ll probably see some initial
deployment of PQ certificates, starting with client support and
then a few servers experimentally adding it. I doubt we&#39;ll
see really wide deployment unless there is some serious pressure,
e.g., from real progress in building a quantum computer.
One bright spot here is the development of automatic certificate
issuance systems like &lt;a href=&quot;https://tools.ietf.org/rfcmarkup?doc=8555&quot;&gt;ACME&lt;/a&gt;
and of automatic CAs like &lt;a href=&quot;https://letsencrypt.org/&quot;&gt;Let&#39;s Encrypt&lt;/a&gt;.
It&#39;s not out of the question that we could issue new certificates
to the entire Web in a matter of weeks to months (see
LE&#39;s &lt;a href=&quot;https://letsencrypt.org/2021/02/10/200m-certs-24hrs.html&quot;&gt;plan&lt;/a&gt; for
this), assuming the
groundwork was already in place.&lt;/p&gt;
&lt;p&gt;The situation is a little different for the non-interactive cases, like DNSSEC
or e-mail:
if the signer signs with both a classical and PQ algorithm,
then even if the verifier initially doesn&#39;t support the PQ
algorithm, they can later go back and verify the PQ signature
if support is added. This means there&#39;s more value in
doing both types of signature. Note that if you receive
a message which was signed only with a classical algorithm,
then it&#39;s still safe to verify it even after a quantum
computer exists, as long as you&#39;re confident
that you received it before the computer was built and it
has been kept unchanged (e.g., on your disk). It&#39;s only
messages which the attacker could have tampered with that
are at risk.&lt;/p&gt;
&lt;p&gt;One of the most attractive cases for PQ signatures is software
distribution, for several reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Software security is a prerequisite for basically every other kind of crypto.
If you can&#39;t trust your software, you can&#39;t trust it to verify anything
else. However, once you have secure software, it can be readily securely
updated, if, for instance, a quantum computer suddenly appears.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You want the signatures on software to be valid for a long time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Software distributions tend to be big, so the size of the signature
isn&#39;t as important by contrast. Similarly, signing and verification time aren&#39;t that important
because you only need to sign the software at release time (which is
a slow process anyway) and the verifier only needs to verify at download.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The post-quantum signature algorithms that we have the most confidence
in (hash signatures) have some annoying operational properties
when used at high signing rates, but these aren&#39;t really much
of an issue for signing software.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For these reasons, it seems likely we&#39;ll see post-quantum
signing for software fairly early.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pq-security/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As should be clear from the above, we&#39;re nowhere near ready for a
post-quantum future: effectively all Internet communications security
depends on algorithms that are susceptible to quantum computers.  If a
practical quantum computer that could break common asymmetric
cryptography were released today, it would be extremely bad; exactly
how bad would depend on how many people could get one. We in principle
have the tools to rebuild our protocols using post-quantum algorithms,
albeit at some pretty serious cost, but we&#39;re years away from doing
so, and even on an emergency basis it would take quite some time
to make a switch. On the other hand, it&#39;s also possible that we won&#39;t
get practical quantum computers for years to come -- if ever -- and
that we&#39;ll have good post-quantum algorithms ready for deployment
or even deployed long before that.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The general intuition here is that your &lt;em&gt;plaintext&lt;/em&gt;
(i.e., the data that was encrypted) has a lot of redundancy,
for instance, it might be ASCII text. If you just try random keys,
you&#39;ll mostly get junk, but when you get the right key, you&#39;ll
get somthing which is ASCII. Of course, if you have
a very short piece of encrypted data, some keys will
just give you things that look right by random chance,
but the more data you have, the less likely that is. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As opposed to information theoretic security, in
which even an attacker with an infinitely powerful
computer is not
able to break your encryption. There are information
theoretically secure algorithms such as the &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=One-time_pad&amp;amp;oldid=1031770915&quot;&gt;one-time pad&lt;/a&gt;
but they are not practical for real-world use because
you need a key of comparable size to the data you
are encrypting. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Arguably, knowing too much gets in the way of thinking about
this at the right level, actually. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
If you do have a shared secret, you can use that to authenticate
but you still want to do key establishment in order to create
a fresh key. See &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Forward_secrecy&amp;amp;oldid=1035724794&quot;&gt;forward secrecy&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Well, maybe. People were building large-ish scale cryptographic
systems before public key cryptography, but they&#39;re pretty
hard to manage. I don&#39;t think anyone wants to go back to what
I&#39;ve been calling &amp;quot;intergalactic &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Kerberos_(protocol)&amp;amp;oldid=1033949819&quot;&gt;Kerberos&lt;/a&gt;&amp;quot; &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Yeah, I know that this &amp;quot;not currently known&amp;quot; phrasing isn&#39;t really that
encouraging. Bear in mind that all the algorithms we are running now have
a decade if not more of analysis, and these PQC algorithms often
do not. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;At some level this isn&#39;t surprising: if these algorithms
performed better we&#39;d probably be trying to use them already. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: the way this is done is just by pretending that
each combination of PQ/EC algorithm is its own EC group, which
fits nicely into the TLS framework. The Cloudflare/Google
experiment was HRSS/X25519 and SIKE/X25519. See also the
IETF &lt;a href=&quot;https://www.ietf.org/archive/id/draft-ietf-tls-hybrid-design-03.html&quot;&gt;spec&lt;/a&gt; for
hybrid encryption. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Given that you&#39;re obviously doing more crypto, this
seems like it tell us that network latency is more
important than CPU cost in many cases. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Technical
note: if you are using static RSA cipher suites, then breaking
Amazon&#39;s key also breaks the key establishment and then it can
impersonate Amazon if somehow the connection is still up. But
of course you &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-aviram-tls-deprecate-obsolete-kex/&quot;&gt;shouldn&#39;t be using&lt;/a&gt; static RSA cipher suites. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
People have also looked at having certificates which contain
both kinds of keys, but IMO this is probably worse, in part
because it makes certificates bigger. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Aside: the semantics of multiple signatures are famously
unclear, so we&#39;re going to get to enjoy that, though
the special case where they are nominally the same
person might be easier. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
One interesting special case is when the relying party knows
that &lt;em&gt;this&lt;/em&gt; authenticating party is using a post-quantum
algorithm. For instance, with SSH the client is configured
with the key of the server and so the client could accept
a classical algorithm for one server but know to expect
a post-quantum algorithm for another server. &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
&lt;a href=&quot;https://certificate.transparency.dev/&quot;&gt;Certificate Transparency (CT)&lt;/a&gt;,
could change the cost/benefit analysis here.
CT is a system in which every certificate is publicly
logged. This allows a site (say &lt;code&gt;example.com&lt;/code&gt;) to see every
valid certificate for their domain and report incorrectly
issued ones. Clients can then check that the certificate
appears on the log and reject it if does not.
If &lt;code&gt;example.com&lt;/code&gt; only has a PQ certificate,
then even a client which would accepts classical
algorithms would still reject a forged certificate
for &lt;code&gt;example.com&lt;/code&gt;.  This brings
us back to the question of how you securely get
the logs, which are (of course) authenticated
with a signature. However, the logs could
switch to PQ signatures fairly quickly
and if clients just rejected classical signatures
from the &lt;em&gt;logs&lt;/em&gt; then they could have confidence
in their correctness and transitively use that
to provide security for the set of valid certificates.
 &lt;a href=&quot;https://educatedguesswork.org/posts/pq-security/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>What&#39;s wrong with QR code menus?</title>
		<link href="https://educatedguesswork.org/posts/qr-code-menus/"/>
		<updated>2021-07-26T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/qr-code-menus/</id>
		<content type="html">&lt;p&gt;TL;DR. Open your restaurant menu QR codes in private browsing mode.&lt;/p&gt;
&lt;p&gt;Today&#39;s NYT has an &lt;a href=&quot;https://www.nytimes.com/2021/07/26/technology/qr-codes-tracking.html&quot;&gt;article&lt;/a&gt; about the popularity of QR code menus at restaurants
instead of paper menus and how they enable tracking:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;But the spread of the codes has also let businesses integrate more
tools for tracking, targeting and analytics, raising red flags for
privacy experts. That’s because QR codes can store digital
information such as when, where and how often a scan occurs. They
can also open an app or a website that then tracks people’s personal
information or requires them to input it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The use of QR code menus &lt;em&gt;does&lt;/em&gt; enable tracking, but importantly,
this is not how it works; understanding how they &lt;em&gt;do&lt;/em&gt; work is
key to understanding what&#39;s going on and how to protect yourself.
At a high level, a &lt;a href=&quot;https://en.wikipedia.org/wiki/QR_code&quot;&gt;QR code&lt;/a&gt;
is just a way of encoding digital information, in this case the address
of the Website (the technical term here is a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=URL&amp;amp;oldid=1015459310&quot;&gt;URL&lt;/a&gt;) in a convenient machine readable form that can then be read so
your phone. So the way that this works is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The URL is encoded into the QR code.&lt;/li&gt;
&lt;li&gt;You point your phone at the code and it detects that it&#39;s a URL&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;The phone -- at least mine -- reads the QR dcode, detects that
it&#39;s a URL, and asks if you want to go to the site.&lt;/li&gt;
&lt;li&gt;You agree and your browser navigates to the site.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At the end of the day, then, this is just a convenient way for
the restaurant to get you to navigate to a URL. They could
instead have printed the URL itself on the table, but for obvious
reasons people would find that to be pain to type in.&lt;/p&gt;
&lt;p&gt;It&#39;s certainly true that the QR code can contain more or less
arbitrary of information&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
and can encode it in the URL so that it gets conveyed to the
Web site. For instance, you can have a link that goes not
just to the menu but to the ordering system and include
your table number so that your order is sent to your table
directly. However, because they&#39;re printed on a piece of paper
-- at least in the case we are talking about here --
they are inherently &lt;em&gt;static&lt;/em&gt; which means that if I scan the
QR code at time A and you at time &lt;em&gt;B&lt;/em&gt; we get the same thing&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
The point here is that the QR code itself cannot store
&amp;quot;when, where, and how often a scan occurs&amp;quot;, because the
QR code doesn&#39;t it change.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
As I said
above, the QR code is just taking you to a Web site and
it&#39;s the &lt;em&gt;Web&lt;/em&gt; that&#39;s the problem, not the QR code.&lt;/p&gt;
&lt;p&gt;What is actually happening is that the Web is full of tracking
mechanisms, mostly in the form of what&#39;s called a &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&amp;amp;page=HTTP_cookie&amp;amp;id=1032666011&amp;amp;wpFormIdentifier=titleform&quot;&gt;&amp;quot;cookie&amp;quot;&lt;/a&gt;.
Many EG readers probably know what a cookie is, but in an effort
to keep things broadly accessible, a cookie is a piece of digital
data that a Web site can store on your computer and then you send
back to that site when you visit it again. Cookies can contain
basically any information the site wants and allow the site to
connect multiple visits by the same person at different times.
This is, for instance, how Amazon maintains your shopping cart
and Facebook keeps you logged in. They&#39;re a basic part of Web
functionality. Importantly, any site can send you a cookie and
your browser will just send it back, so cookies can -- &lt;em&gt;and are&lt;/em&gt; --
used to track your behavior even in contexts when there is no
obvious user-visible state like shopping carts, etc.&lt;/p&gt;
&lt;p&gt;It&#39;s worth walking through how tracking works in a situation like
this. Suppose you go to Example Restaurant
and scan the QR code, which tells you to go to &lt;code&gt;https://example.com/&lt;/code&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.
The first time you do that, the restaurant hasn&#39;t stored a cookie,
so it just knows you&#39;re a new person and stores a cookie. But the
next time you come back, it can read that cookie and see you
are a repeat customer. This isn&#39;t that useful in itself, because lots of
customers probably scanned this QR code, but if the
URL encodes the table number
(e.g., &lt;code&gt;https://example.com/?table=123&lt;/code&gt;) or the link goes
to an ordering system rather than a menu, then the site can remember
what you ordered and adjust its behavior accordingly (&amp;quot;Hi Eric,
last time you ordered the Pizza Margherita. Would you
like that and maybe some garlic bread?&amp;quot;).
It isn&#39;t necessarily just this one restaurant either. Depending
on how the system is put together, your behavior might be tracked
across multiple restaurants -- via technical mechanisms that are
quite straightforward but out of scope for this post --
to build up a picture of your eating behavior.&lt;/p&gt;
&lt;p&gt;The thing to recognize is that there&#39;s nothing special about
QR codes, this is just the normal (terrible!) level of tracking
that already exists on the Web. The article quotes Jay Stanley
from ACLU on this point:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“People don’t understand that when you use a QR code, it inserts the
entire apparatus of online tracking between you and your meal,” said
Jay Stanley, a senior policy analyst at the American Civil Liberties
Union. “Suddenly your offline activity of sitting down for a meal
has become part of the online advertising empire.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I half agree here: it&#39;s true that this kind of QR code menu
pulls you into the Web tracking ecosystem and it&#39;s likely
that many people don&#39;t understand that. However, it&#39;s also
the case that many people don&#39;t understand how much their
behavior is already tracked on the Web even in cases where
QR codes aren&#39;t involved (which is why it&#39;s so important for
Web browsers to build in anti-tracking features such as
Firefox &lt;a href=&quot;https://support.mozilla.org/en-US/kb/enhanced-tracking-protection-firefox-desktop&quot;&gt;Enhanced Tracking Protection&lt;/a&gt;
and Safari &lt;a href=&quot;https://webkit.org/blog/9521/intelligent-tracking-prevention-2-3/&quot;&gt;Intelligent Tracking Prevention&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;).&lt;/p&gt;
&lt;p&gt;In this particular case, however, these mechanisms aren&#39;t
as effective as you would like. The reason is that they are
designed to prevent you from being tracked across sites, but
(1) we are concerned about repeat visits to the same site and
(2) multiple restaurants might use the same Web site, or
at least bounce the use through them (with a URL like
&lt;code&gt;https://example.com/?restaurant=pizza-palace&lt;/code&gt;).
In either case, dining history leaks even if the
default anti-tracking mechanisms are on.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;
Probably the best thing would be if devices were to open
QR codes in a new browsing context with new
cookie state. Some limited testing with my iPhone suggests
that it opens up URLs from QR codes in whatever mode you
are current using Safari in: If you are currently using Safari in Private mode,
it will open up URLs from QR codes in Private mode
which seems to do the right thing
but if -- as is more likely -- you are using Safari
in regular mode, then it will open up URLs in regular
mode, which allows you to be tracked.&lt;/p&gt;
&lt;p&gt;Of course, there is a tradeoff here: if URLs were opened in
private mode by default, then people who want their state to be maintained
(for instance, if they have an account with the restaurant
that lets them order without entering new payment information,
or if they are part of a loyalty program) would be inconvenienced.
This is probably a situation where the browser could help
(&amp;quot;I see you have logged in here, do you want to let this
site remember you for future visits?&amp;quot;). In my experience, however,
most QR codes don&#39;t go to sites that actually need to track
you, so it seems like there is an opportunity for better defaults
here.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Interestingly, there &lt;a href=&quot;https://github.com/zxing/zxing/wiki/Barcode-Contents&quot;&gt;doesn&#39;t seem&lt;/a&gt; to be any meta-information telling
you that it&#39;s a URL, rather it&#39;s just that it looks like one
because it has &lt;code&gt;http://&lt;/code&gt; or &lt;code&gt;https://&lt;/code&gt; in front of it, though
see &lt;a href=&quot;https://web.archive.org/web/20160213153725/https://www.nttdocomo.co.jp/english/service/developer/make/content/barcode/function/application/bookmark/&quot;&gt;here&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Not truly arbitrary because they&#39;re not infinite sized
so only somewhere in the 100-1000 character range, but
for our purposes, plenty. &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As mentioned previously, this is actually an issue
for some applications, like vaccine passports, whiere it
would be &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/&quot;&gt;convenient&lt;/a&gt;
to be able to change the code later. &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As a real aside here, this non-changing property of
paper stuff is why paper-based elections such as
&lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/&quot;&gt;optical scan&lt;/a&gt;
ballots are so popular with election security people. &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Think of all the business that restaurant must get! &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Sorry
for the technical link there; this is what I could find. If someone sends me a more general
Safari ITP link, I can update. &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I should mention at this point that if you&#39;re
paying with a credit card, the privacy story
is also &lt;a href=&quot;https://www.fastcompany.com/90490923/credit-card-companies-are-tracking-shoppers-like-never-before-inside-the-next-phase-of-surveillance-capitalism&quot;&gt;quite bad&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/qr-code-menus/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>A look at the EU vaccine passport</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport-eu/"/>
		<updated>2021-07-20T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport-eu/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://dennis-jackson.uk/&quot;&gt;Dennis Jackson&lt;/a&gt; pointed me at the documents for the
EU&#39;s &lt;a href=&quot;https://ec.europa.eu/commission/presscorner/detail/en/qanda_21_1187&quot;&gt;Digital Green Certificate&lt;/a&gt; (DGC) vaccine passport system. At a high level, this is pretty similar
to the Excelsior Pass and Vaccine Credentials Initiative systems I
wrote about earlier (&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc&quot;&gt;NYC&lt;/a&gt;, &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca&quot;&gt;VCI&lt;/a&gt;),
except with some slightly different data formats
(&lt;a href=&quot;https://tools.ietf.org/rfcmarkup?doc=8152&quot;&gt;COSE&lt;/a&gt; instead of &lt;a href=&quot;https://datatracker.ietf.org/doc/rfc7515/&quot;&gt;JOSE/JWS&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;,
a &lt;a href=&quot;https://ec.europa.eu/health/sites/default/files/ehealth/docs/covid-certificate_json_specification_en.pdf&quot;&gt;new JSON structure&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
for the vaccine certificate data itself rather than reusing
an existing one, etc.) In themselves, these seem like sane choices, though
it&#39;s a little silly that we have multiple groups independently
creating pretty isomorphic though slightly different formats to do more
or less the same thing. That&#39;s the way things go sometimes, but still
not great. Moreover, this system does have a number of somewhat
odd features, as detailed below.&lt;/p&gt;
&lt;h2 id=&quot;trust-structure&quot;&gt;Trust Structure &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#trust-structure&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/&quot;&gt;covered previously&lt;/a&gt;, a
signature-based credential system needs some mechanism for the
verifying app to know which signers are valid. You could in principle
just bake all of the valid signers into the app, but that&#39;s not very
flexible (what happens if you want to add or remove a signer?), so
instead what you typically do is bake in some set of entities that you
trust and allow those entities to update the list of valid signers
in some fashion.
For instance, in the WebPKI the way you do this is
to have a set of &amp;quot;trust anchors&amp;quot;, i.e., entities who are authorized
to delegate the right to other entities to sign the credential.
When an end-entity (e.g., a Web server) wants to authenticate it
presents both its own certificate and a &lt;em&gt;chain&lt;/em&gt; of certificates
that goes back to one of the trust anchors. This allows the relying
party to transitively validate the end-entity by verifying that
the end-entity certificate was signed by a certificate that was
signed by a certificate and so on until you get back to a trust
anchor.&lt;/p&gt;
&lt;p&gt;This is not how these vaccine passport systems seem to be designed,
however, though it&#39;s actually not clear to me why. It&#39;s possible that
the designers are trying keep the credentials small enough
to easily fit in a bar code, but you should be able to fit things
in fairly easily: the VCI uses V22 QR codes which can have
1195 characters. Even without getting fancy&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
you should be able to put together a custom ECDSA certificate format
in about 128-160 bytes. Unlike VCI certs, the Digital Green Certificate spec allows
for RSA-2048, which has quite large signatures, so this may be
the reason, though of course allowing RSA is itself a design choice (probably the
wrong one).&lt;/p&gt;
&lt;p&gt;In any cases, for both VCI and DGC, the credential itself just contains a reference to
the key which (allegedly) signed it (in what&#39;s called a &lt;code&gt;kid&lt;/code&gt; (key id)
and the verifying app has to obtain the key itself in some way. For instance,
in the California vaccine credential I &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca/&quot;&gt;looked at&lt;/a&gt;,
there was a link to a Web site containing the key and (hopefully) the verifier
app would be provisioned with a list of all of those valid URLs.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
DGC doesn&#39;t say how verifier apps get the keys at all. It merely
assumes that they have a list of all the &lt;em&gt;Document Signing Certificates&lt;/em&gt; (DSC),
obtained in some unspecified fashion, presumably arranged for by the app author
(which DGC seems to assume is the national government of wherever you are).&lt;/p&gt;
&lt;h2 id=&quot;signing-certificate-distribution&quot;&gt;Signing Certificate Distribution &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#signing-certificate-distribution&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;More interesting, perhaps, is how the app author learns the list of DSCs.
The situation here is quite complicated because the DGC system assumes
that both credential issuance and credential verification will be
organized along national lines, but that you also want interoperability,
so that, for instance, someone with the French verifier app will be
able to verify credentials issued in Germany to German nationals, with
each government more or less having its own policies and just telling
other governments how to verify their credentials (i.e., their list of
DSCs). The resulting design is... complicated:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/dgc-overall.png&quot; alt=&quot;DGC Overall Diagram&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Image source: &lt;a href=&quot;https://ec.europa.eu/health/sites/default/files/ehealth/docs/digital-green-certificates_v5_en.pdf&quot;&gt;Technical Specifications for Digital Green Certificates Volume 5&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The general idea here is that each country operates their own
infrastructure, complete with a verifier app, signing keys, etc.  This
is self-contained in the sense that if you didn&#39;t care about other
countries it would all work on its own. Then there is a centralized
&lt;em&gt;Digital Green Certificate Gateway&lt;/em&gt; (DGCG) that is responsible for
interchanging country&#39;s signing keys so that each country has every
other country&#39;s keys.&lt;/p&gt;
&lt;p&gt;This is all fairly reasonable -- though, as I noted in the previous
section, kind of unnecessary if you&#39;re just willing to have the
credentials carry their own certificate chain. The actual details are
a bit odd, however. First, each country has their own &lt;em&gt;Country Signing
Certificate Authority&lt;/em&gt; (CSCA).  They use their &lt;em&gt;CSCA&lt;/em&gt; to sign
&lt;em&gt;Document Signing Certificates&lt;/em&gt; (DSCs) which are then used to sign end
entity credentials (i.e., vaccine passports). So far, this is a
conventional PKI.&lt;/p&gt;
&lt;p&gt;Countries are required to upload their DSCs their DSCs to the DGCG
This upload is authenticated in two separate ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The national backend authenticates with TLS authentication
with one key (&lt;em&gt;NB_TLS&lt;/em&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The package containing the DSCs is signed with a separate key
(&lt;em&gt;NB_UP&lt;/em&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each national backend downloads the uploaded DSC packages from the
DGCG. The DGCG also publishes a list of the NB_UP and NB_CSCA keys
signed with its own key. These can be used by country B to verify the
DSC package and the DSCs published by country A into the DGCG.&lt;/p&gt;
&lt;p&gt;This all seems extremely complicated, with a number of seemingly
redundant authentication mechanisms. For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The DSCs are signed by both the CSCA and the NB_UP key. The
receiving national back-end has both public keys, so why
isn&#39;t one signature good enough?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The uploaded package is authenticated with TLS but also signed.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
Why isn&#39;t signed enough?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Moreover, all of this signing obscures the trust relationships,
which seem to ultimately go back to trusting the DGCG. The reason
for this is that a receiving country obtains the list of now
current NB_CSCA and NB_UP keys from the DGCG (signed by some
offline DGCG key). This means that if the DGCG is compromised,
it can just replace those keys with keys of its choice and
thus impersonate any other country.
There are a number of designs
which seem like they would be a lot simpler and provide similar
security properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Have the DGCG directly sign the DSCs, with the national
backends acting as &amp;quot;registration authorities&amp;quot; for the
DGCG (though it&#39;s possible this is undesirable for
political reasons).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Have the DGCG just sign the CSCA certificates and then
the national backends can upload new DSCs as they
are minted (note that the package need not be signed
because the DSCs themselves are signed.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you didn&#39;t want to trust the DGCG you&#39;d need
some other structure. For instance, countries could
get each other&#39;s CSCA keys directly, or at least had
some &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Certificate_Transparency&amp;amp;oldid=1034052630&quot;&gt;Certificate Transparency&lt;/a&gt;-type system to detect DGCG misbehavior.&lt;/p&gt;
&lt;p&gt;Note that we (maybe) still need some way to deal with revocation,
but I don&#39;t think that this system makes that dramatically easier.&lt;/p&gt;
&lt;h2 id=&quot;revocation&quot;&gt;Revocation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#revocation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One topic that often comes up in these designs is &amp;quot;revocation&amp;quot;, i.e., signaling that a
given certificate should not be trusted. This is a &lt;a href=&quot;https://unmitigatedrisk.com/?p=583&quot;&gt;whole
topic&lt;/a&gt; for WebPKI, but of course
the relevance depends on the setting.
I don&#39;t think it&#39;s that useful to individually revoke people&#39;s
vaccine credentials on a small scale (e.g., because you discover
that they didn&#39;t have an immune response or something). We all
know that vaccination is imperfect, and so a bit of error
here isn&#39;t the end of the world. The cases that seem more
interesting are ones where we believe that a large number
of credentials might have been misissued. For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;There is a compromise of one of the DSC keys.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We discover that a given vaccine site has been selling fake
credentials (e.g., reporting that some was vaccinated but
not actually vaccinating them).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s not entirely clear to me how important these cases are.
As above, we know that even if vaccination information is perfectly
accurate, some people won&#39;t be protected, so it&#39;s possible that
some level of fraud is tolerable. But if we&#39;re &lt;em&gt;not&lt;/em&gt; willing
to tolerate it and we do want to revoke the credentials we know
were issued incorrectly, things get more complicated.&lt;/p&gt;
&lt;p&gt;The basic issue here is the number of credentials you need to
revoke. If we know that &lt;em&gt;every&lt;/em&gt; credential issued by a given DSC
is fraudulent, then we can just publish that that DSC is not to
be trusted (never mind how we do that). But what if only &lt;em&gt;some&lt;/em&gt;
of those credentials are fraudulent, for instance if a lot
of credentials were issued before the DSC key was compromised
or a DSC was serving two sites, with only one of them committing
fraud. On the Web this is sometimes handled by just revoking
the CA and forcing all its customers to get new certificates,
but that&#39;s not going to work well here because the vaccine
credentials are static data (often printed on paper!) and so
there&#39;s no real way to update them, and so invalidating
a DSC also invalidates a lot of valid credentials.
We quickly get to the point where you
need to publish a list of all the invalid credentials
(this assumes we can in fact identify them), potentially in some
compressed form. In the WebPKI, this is actually somewhat
challenging because there is a &lt;em&gt;lot&lt;/em&gt; of revocation and so the
size of the revocation list can get quite large. My best guess
is that that won&#39;t happen here, but if it does you would presumably
need to figure something out. Note that the DGC system seems to only
allow for revoking DSCs, which doesn&#39;t really solve the problem
for the reasons above.&lt;/p&gt;
&lt;h2 id=&quot;credential-loading&quot;&gt;Credential Loading &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#credential-loading&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Unlike the credentials issued by VCI, you can&#39;t just load the
digital green certificate onto your phone. Instead, there is
a &amp;quot;2FA&amp;quot; process involving a special code called a TAN (it&#39;s not
clear to me what this is an acronym for). The idea here is
that the credential is provided in a printed out QR code along with the
TAN  (provided via SMS or e-mail or something)
and that (1) you need the TAN in order to load the credential
onto the phone and (2) the TAN is invalidated once it&#39;s used
in order to prevent the credential from being loaded onto
two separate phones.&lt;/p&gt;
&lt;p&gt;Here&#39;s what the &lt;a href=&quot;https://ec.europa.eu/health/sites/default/files/ehealth/docs/digital-green-certificates_v4_en.pdf&quot;&gt;spec&lt;/a&gt;
says (Section 5.2):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;TAN validation is an easy matter—upon scanning a DGC, the wallet app
creates a cryptographic key pair. Then, the TAN and the DGCI are
signed with the newly created private key and uploaded together with
the corresponding public key. The certificate backend checks the
signature and verifies whether&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;The DGCI exists&lt;/li&gt;
&lt;li&gt;There haven’t been more than the specified number of TAN validation requests&lt;/li&gt;
&lt;li&gt;The submitted TAN corresponds to the TAN stored together with the DGCI&lt;/li&gt;
&lt;li&gt;The stored TAN is not expired&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;If all these points are positively answered, TAN validation has been
successful, the user’s public key is stored together with the DGCI,
and the corresponding DGC is marked as “registered” (meaning it
can’t be registered again—a digitized Green Certificate can’t be
digitized by any other wallet app). Otherwise, an appropriate error
code is returned.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&#39;s pretty hard to understand what&#39;s going on here. Superficially,
the idea seems to be to ensure that the DGC is bound to the phone
of the user (bootstrapped using the TAN) and that it can&#39;t be replayed
by another user, by registering the public key generated on the
initial import, but in order to make that work, you would need credential
&lt;em&gt;verifiers&lt;/em&gt; to actually verify that the person in front of them
had the corresponding private key, and that&#39;s not what happens.
Instead, the official wallet just refuses to load the
credential if the TAN doesn&#39;t match. However, nothing stops me
from writing an unofficial wallet which loads any credential
it sees (e.g., because it&#39;s also a verifier app). If you actually
wanted to prevent this kind of replay, you would need the verifier
to have public key that the user registered and then force
the user&#39;s app to prove knowledge of the corresponding public
key, for instance by signing something,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
but this is inconsistent with having a paper-based
credential. And of course as soon as you allow paper-based
credentials -- which can be copied indefinitely -- there isn&#39;t
much point in restricting digital copying.&lt;/p&gt;
&lt;p&gt;Moreover, none of this is necessary because the credentials are
tied to a user&#39;s identity and you have to present some sort of
biometric ID to prove it&#39;s really you. This means that it&#39;s not
a problem to allow copying of the credential, which really
only exists to prove that someone with your name was vaccinated
(see &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/&quot;&gt;here&lt;/a&gt; for more on this).&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;None of the stuff I&#39;ve listed above is really fatal; and as
described the system should probably work OK. It&#39;s just a bunch
of extra complexity that doesn&#39;t seem to do much in particular.
As I said at the beginning, it&#39;s kind of unfortunate that we
have all these independent groups building these systems:
this tends to lead to a bunch of somewhat similar and yet
different designs that each have their own idiosyncrasies
and none of which has gotten the scrutiny it really needs.
Maybe eventually we&#39;ll see an attempt at a common protocol,
though of course by then it&#39;s likely that people will be
attached to those idiosyncrasies and so we&#39;ll get a system
that&#39;s more like a merger of all the designs than a single
design that picks the best features of each.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Acknowledgement&lt;/em&gt;: Thanks to Dennis Jackson for pointing out some of the issues
here, especially the ones about loading the passport onto the phone.
Mistakes are mine.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For the uninitiated, there are at least four standard object security
mechanisms. The general situation here is that every few years a new generic
format for serializing structured data comes along (e.g., ASN.1/BER, XML, etc.)
and naturally people want to send around data that&#39;s been encoded in that format.
But they also want to sign and encrypt that data, and that signing and encryption
requires its own formatting to carry metadata like key identifiers, signatures,
wrapped keys, and the like. Naturally, people don&#39;t want to lug around &lt;em&gt;two&lt;/em&gt; serialization
formats and so now there&#39;s a need to invent a new secure object format
that uses the new serialization format. Hence, we have
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8933&quot;&gt;CMS&lt;/a&gt; (in ASN.1/BER),
&lt;a href=&quot;https://www.w3.org/TR/xmldsig-core/&quot;&gt;XMLDSIG&lt;/a&gt; (in XML, a W3C spec this time),
&lt;a href=&quot;https://datatracker.ietf.org/doc/rfc7515/&quot;&gt;JOSE&lt;/a&gt; (in JSON),
and &lt;a href=&quot;https://tools.ietf.org/rfcmarkup?doc=8152&quot;&gt;COSE&lt;/a&gt; (in CBOR),
plus the ones that are tied to some conceptual application
like OpenPGP and HTTP object encryption. Of course, all of these
are different, reflecting the prevailing design sensibilities of the
time and the hope that this time we&#39;d get it right or at least not bungle
it so badly (full disclosure: I was part of the early JOSE effort,
but checked out later on).
Mercifully, COSE was defined
shortly after JOSE and so is mostly a port of the JOSE structures -- for
good or ill -- into CBOR. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Remember what I said a minute ago about not wanting to use two serialization
formats together? Well, that&#39;s what we have here. I have no idea why
the EU decided to use COSE/CWT instead of JOSE/JWT. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In this case, &amp;quot;fancy&amp;quot; would mean something like
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=BLS_digital_signature&amp;amp;oldid=1023559556&quot;&gt;BLS&lt;/a&gt;
which is both smaller and allows for aggregated signatures
in which you can compress multiple signatures into the
size of one. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that this implicitly trusts the WebPKI because anyone who
can impersonate the Web site can just substitute their own
key. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Incidentally, with CMS, so here we have a system that has
three separate serialization formats: ASN.1/BER, COSE, and JSON. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that you wouldn&#39;t require that the TAN be single use;
you just need to ensure that only the rightful
user could register, not that they can&#39;t register multiple
times. And of course because the user has to be assumed
to control their app, they can just copy their private
key around. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-eu/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Bigfoot 73 Race Report</title>
		<link href="https://educatedguesswork.org/posts/bigfoot73/"/>
		<updated>2021-07-14T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/bigfoot73/</id>
		<content type="html">&lt;p&gt;Last weekend I ran &lt;a href=&quot;https://www.bigfoot200.com/bigfoot-73-mile.html&quot;&gt;Bigfoot 73 miler&lt;/a&gt;
up in Washington around Mt. St Helens. I didn&#39;t go into this season planning to race
Bigfoot but then &lt;a href=&quot;http://sandiego100.com/&quot;&gt;San Diego 100&lt;/a&gt; was canceled
thanks to COVID-19, so I had to find something else and Bigfoot
looked interesting&lt;/p&gt;
&lt;p&gt;As advertised, this was hard, but
overall it went quite well. The course was extremely technical with
many steep climbs (see the altitude profile below) and several a few sections that requires
some scrambling and
the like, as well as two long boulder fields that you really had to
pick your way through, one of which was in the dark.
An extra challenge here is that because the course is so remote
the aid stations are very far apart, with the longest stretch
between aid stations of about 18 miles. It was not however, 73 miles but rather about 66.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/bigfoot-profile.png&quot; alt=&quot;Profile&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;start-to-blue-lake&quot;&gt;Start to Blue Lake &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/bigfoot73/#start-to-blue-lake&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I had to get up about 2:30 to get to the race on time for the
5:30 start. The first 4 miles or so are a 2000 ft climb and consistent with my
strategy I took it out pretty hard, power hiking pretty much the whole
way but doing it at the top of the hiking range, using poles
(&lt;a href=&quot;https://www.blackdiamondequipment.com/en_US/product/distance-carbon-z-trekking-running-poles/&quot;&gt;Black Diamond Carbon Z&lt;/a&gt;) with the usual
&lt;a href=&quot;https://youtu.be/OB0LABCYlto&quot;&gt;double pole technique&lt;/a&gt;.
At this point I should mention that many ultras, especially more
mountainous ones, involve quite a bit of hiking rather than running.
Once things get steep, it&#39;s far more efficient to hike than it
is to run and ultras are all about conserving energy.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/bigfoot73/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
I started out somewhat towards the front
and passed a number of people on the climb. Once you get to the top
there is an extended boulder field of about a mile, with the boulders
being maybe 1-2m wide. I had some trouble with this section, partly
because the poles don&#39;t really help that much in this setting but I
had trouble getting them back into the &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/custom-quiver.html&quot;&gt;Salomon quiver&lt;/a&gt;, which actually
fell off (I ended up stuffing it in my pack eventually). I mostly
just kept the poles in my hand and made my way through, but lost some
time.&lt;/p&gt;
&lt;p&gt;After the boulders, there&#39;s an extended descent down to Blue Lake,
over quite runnable single track. I caught up to a number of people on
that section and mostly didn&#39;t see them again. I wasn&#39;t feeling that
great during this section and it seemed very long. I also tripped
quite a few times but didn&#39;t go down, which is usually a sign of
fatigue for me, which is unusual this early in an event.
I was surprised that came in at 2:37, which was ahead
of schedule.  This was billed as a 12 mile stretch but my GPS said 11
(the aid station volunteer billed it as 13) and others
reported the same.&lt;/p&gt;
&lt;p&gt;You&#39;ll notice at this point that this isn&#39;t that fast: about 14 minutes/mile.
Partly this is because &lt;em&gt;I&#39;m&lt;/em&gt; not incredibly fast, but in general
mountain ultra-trail races are slow. For instance, Jim Walmsley&#39;s
phenomenal 14:09 Western States &lt;a href=&quot;https://www.wser.org/records/&quot;&gt;record&lt;/a&gt;
is somewhere over 8:00/mile. This is due to the length of the
course as well as how much slower people run when climbing.&lt;/p&gt;
&lt;h2 id=&quot;blue-lake-to-windy-ridge&quot;&gt;Blue Lake to Windy Ridge &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/bigfoot73/#blue-lake-to-windy-ridge&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The next stretch to Windy Ridge was long and exposed, but without as
many really long climbs. Mostly, it was just a long section of
modestly rolling terrain crossing stream beds, so you would have to
keep going down into the stream bed and then climb back out.  I ran
the flats and descents where I could (a lot of them were quite rocky)
and then hiked the climbs. I used the poles almost the whole time
here, which mostly worked well except for one quite difficult section
which involved a rope descent followed by a rope climb out of the same
gully, which was hard to do with poles in hand.&lt;/p&gt;
&lt;p&gt;Because this section is so long, it&#39;s not really practical to
carry enough fluid: at 15 minutes/mile 15 miles is almost 4 hours
and in the heat you&#39;d like to do 500-1000ml/hr. I was carrying
2l but that&#39;s obviously not enough -- and you don&#39;t want to
carry more because water is heavy -- so you need to drink
water from streams and the like. However, these sources
are sometimes contaminated with &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Giardia&amp;amp;oldid=1029925134&quot;&gt;giardia&lt;/a&gt; or the like, so you want to treat the
water. I use a Salomon &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/xa-filter-cap-42.html#color=45979&quot;&gt;XA filter&lt;/a&gt; that is just a cap that fits on your
water bottle, so you can just dip the bottle into the stream
and drink directly from it. Definitely was doing some of this
on this section.&lt;/p&gt;
&lt;p&gt;Finally, there&#39;s a
long out and back to Windy Ridge on a dirt road, with about a mile
climb and then a mile descent. Was able to really push here.&lt;/p&gt;
&lt;h2 id=&quot;windy-ride-to-norway-pass&quot;&gt;Windy Ride to Norway Pass &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/bigfoot73/#windy-ride-to-norway-pass&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Leaving Windy Ridge, I noticed I was getting some pain in my left
foot. I thought it might be a blister&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/bigfoot73/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
but when I took my sock off it
was more like a wrinkle in the foot that had gotten swollen because of
moisture, so I put some lube on the hot spot and kept moving.&lt;/p&gt;
&lt;p&gt;The Windy Ridge to Norway Pass section is the longest (billed at 20
miles but actually 18) but it&#39;s also middle of the day and hyper
exposed. The first tranche of this is mostly across the same kind of
lowland river terrain, so there was a fair amount of bushwacking,
followed by a long climb in to the highlands and then a lot of
traverse across snow fields and the like. Did a lot of this with
Jennifer Schweiger, the eventual female &lt;s&gt;winner&lt;/s&gt; second-place finisher before she dropped
me. It was getting quite hot by this point and I was starting to run
out of water and I got a bit fooled by how much fluid there was in the
lowlands, but by the time we got to the highlands there were
snow fields but not much water and I was down to like .75l with 7+
miles to go. I initially stuffed snow into my filter but before it
could melt we found some actual runoff, so that was easier.&lt;/p&gt;
&lt;p&gt;At this point I was starting to feel kind of nauseated (probably due
to not drinking enough) and it was a struggle to get in fluid and
calories. In
particular, because I was mostly drinking water I wasn&#39;t getting that
much calories and the PowerGel I had brought started to taste kind of
gross (surprisingly Spring energy was better even though  a usually not a fan).
At this point Jennifer
starting to pull away, somewhat on the climbs but mostly on the
snow fields where I&#39;m not that good and also on the flats where running
was starting to feel pretty difficult and eventually she gapped
me and I didn&#39;t see her until the end. There&#39;s a long descent into Norway that&#39;s mostly single track and
that felt comfortable, though I was working a bit to keep up with
someone else I ran into.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/bigfoot73/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;norway-pass-to-windy-ridge&quot;&gt;Norway Pass to Windy Ridge &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/bigfoot73/#norway-pass-to-windy-ridge&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You just turn around and climb out of Norway. I felt good here.  The
backtracking part isn&#39;t that steep but then when it diverges from the
trail in it&#39;s some real bushwhacking, which was pretty hard. Next it
opens out out onto the highway and you have to do a long 2-3 mile on
the road, which I was able to do very quickly. Then there&#39;s a traverse
and a dirt and stairs descent into Windy Ridge.&lt;/p&gt;
&lt;p&gt;First Coke here.&lt;/p&gt;
&lt;h2 id=&quot;windy-ridge-to-finish&quot;&gt;Windy Ridge to Finish &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/bigfoot73/#windy-ridge-to-finish&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I had a spare pair of socks at Windy Ridge so I changed them and lubed
up my feet and headed out back out the dirt road for the last 13.5 miles.
This last segment comes in five main pieces: (1) a steep climb (2) a
long up/down section of stream gullies and big rocks (3) some nice
dirt single track (4) about a mile long boulder field (5) a dirt
road descent. The river gully section seemed very technical though
this may have just been in the dark; you would go down this rock and
scree slope, cross the bed (sometimes dry, sometimes not) and then
come back up.&lt;/p&gt;
&lt;p&gt;First caffeine capsule around here.&lt;/p&gt;
&lt;p&gt;At this point it was starting to get dark, so I had to do this on
headlamp &lt;a href=&quot;https://www.petzl.com/US/en/Sport/ACTIVE-headlamps/ACTIK-CORE&quot;&gt;Petzl Actik Core&lt;/a&gt;. I&#39;d been doing pretty well in terms of
stability but I stumbled a bunch of times in the stream gullies. I had
two notable falls, the first where I stepped off the path with one
foot and almost slid down a long sandy slope. I ended up with just my
arms and head on the path and had to grab a rock and pull myself up.
The second I tripped on a rock and nearly landed face first. I got
some bruises
from that but was able to walk it off.&lt;/p&gt;
&lt;p&gt;Second caffeine capsule around here.&lt;/p&gt;
&lt;p&gt;The dirt section after that was OK, and I thought I was getting close
to the end but then ran into the second boulder field.  This part was
especially hard, objectively probably no worse than the first one, but
in the dark it was hard to wayfind and to stabilize. Finally, it
opened up onto the dirt road back to the start.&lt;/p&gt;
&lt;p&gt;Battery change here: note that it&#39;s hard to change the battery,
especially to disposables without light. Fortunately someone had one,
though I could have used my phone.&lt;/p&gt;
&lt;p&gt;At this point I was with two other guys, Tim and Saul. Saul took off
fast and I followed a bit behind. I was watching my GPS and it looked
like I was off course and so I backtracked back to the intersection, where
Tim and I looked at the GPS track which showed another trail, but the
markings clearly showed the trail we were on, so I turned around and
took the trail I was originally on. This probably cost about 5
minutes. From there it was easy downhill till the end.  I was feeling
good at this point, and was limited by the footing, and wouldn&#39;t have
had trouble going longer, or harder with smoother trail.&lt;/p&gt;
&lt;h2 id=&quot;retrospective&quot;&gt;Retrospective &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/bigfoot73/#retrospective&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Generally, I think I paced this pretty well. I do think I had some
spare aerobic capacity and might have been able to run a bit more of
the runnable trail bits towards the end, but in many cases things were
just unrunnable and even hard to hike, so I would actually probably
have benefited as much training wise from really steep hiking.&lt;/p&gt;
&lt;p&gt;I think my heat training was also a success. I had spent a lot of time
running in the heat and sitting in my car with the heat on in
order to adapt. It was warm but I never felt
that overheated, though of course I was thirsty.&lt;/p&gt;
&lt;p&gt;I felt kind of bad at the beginning and was kind of tempted to pack it
in at Windy 1 (where you could drop down to 40), but that eventually
faded. Not quite sure the mechanism of that. I think honestly I might
have just been feeling the weight of a really long day once I realized
that there were going to be a lot of physically demanding pieces as
opposed to just running/hiking.&lt;/p&gt;
&lt;p&gt;A few things could have gone better here. Probably the biggest is
nutrition. Because of the long segments here, I never had enough
&lt;a href=&quot;https://www.tailwindnutrition.com/shop/&quot;&gt;tailwind&lt;/a&gt; sports drink
to go the whole way between aid stations -- and with the
Salomon filter it&#39;s difficult to just filter directly into a bottle --
so I probably was only getting about 1/2 as many calories as I do in
training, and I quickly lost my taste for gels and even M&amp;amp;Ms. I also
probably wasn&#39;t drinking enough in general.
I did try to aggressively drink at aid stations and I
think that helped some. This would be easier at a more conventional
race because you wouldn&#39;t run out of sports drink as much, but still I
think I would do better with more of a mix of food and less of a
reliance on gels and sweet stuff.&lt;/p&gt;
&lt;p&gt;I need to figure out some better story for pole storage. I did want them
most of the time, but it was enough of a hassle to stow them that I
kept them in hand in places where it might have helped earlier.  The
Salomon quiver was kind of a fail but also bungee-ing them to the back
doesn&#39;t work well for me because they&#39;re hard to get on and off.&lt;/p&gt;
&lt;p&gt;I had balance/tripping problems in two sections: at the beginning
where I tripped a lot but never went down and then at the end when I
actually fell twice. I&#39;m not sure about the beginning, it just felt
like I wasn&#39;t warming up that well and that got better as the day went
on. At the end of the day I had a lot of balance problems on the
difficult terrain. Some of this is expected and I of course saw other
people fall too, but there&#39;s room for improvement. In particular, I
had more trouble on the boulder sections than others I think. Some of
that may be down to poles in hand (see above) but some of it could
benefit from balance work.&lt;/p&gt;
&lt;p&gt;I wish I&#39;d gotten on the bottom of my foot earlier. It never got so
bad that I couldn&#39;t run but it kept threatening to and now there is a
blister about 1x2cm. I had spare socks and should have swapped at
Windy both times. With that said, my feet kept getting wet and dirty
and at the end of the day it was fine. The &lt;a href=&quot;https://www.salomon.com/en-us/shop/product/sense-4-pro.html#color=40225&quot;&gt;Salomon Sense 4 Pros&lt;/a&gt;
handled this all pretty well, though the laces do come out of the
pocket if you&#39;re not careful.&lt;/p&gt;
&lt;p&gt;Finally, it was a mistake to trust the GPS track over the course
markings at the end. If I hadn&#39;t backtracked, I think I &lt;s&gt;would&lt;/s&gt; might have
been one place up. Serves me right for being obsessive and
completionist about the right course.&lt;/p&gt;
&lt;h2 id=&quot;results-summary&quot;&gt;Results Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/bigfoot73/#results-summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Time&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;19:25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Actual distance&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;66.4 miles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Place&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;&lt;s&gt;12th overall, 10th male (?)&lt;/s&gt; 15th otherall, 12th male&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;s&gt;Note: I am still waiting for the results, this
is from the tracker, so I might be one place back.&lt;/s&gt;&lt;/p&gt;
&lt;p&gt;Updated: 2021-07-16 now that the &lt;a href=&quot;https://ultrasignup.com/results_event.aspx?did=82991#id14363&quot;&gt;results&lt;/a&gt;
are up. To be honest, the situation is a little confusing, as these
results do not agree with what I heard right after the race,
with what appears in the &lt;a href=&quot;https://trackleaders.com/bigfoot73-21&quot;&gt;tracker&lt;/a&gt;,
or what was originally posted yesterday, in that there are several new
names in the top 10, so I&#39;m not quite sure what happened.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Segment&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Distance&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Elevation&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Time&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Pace&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;GAP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Start to Blue Lake&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;11.01&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+2631/-2113&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2:37:46&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:20&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;12:14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Blue Lake Aid&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;8:01&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Blue Lake to Windy Ridge&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;15.71&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+3189/-2326&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4:16:22&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;16:19&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;14:10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Windy Ridge&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;10:36&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Windy Ridge to Norway Pass&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;18.06&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+3356/-3701&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5:13:12&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;17:20&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;15:09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Norway Pass&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;18:07&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Norway Pass to Windy Ridge&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7.49&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1598/-1253&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;2:03:20&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;16:28&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;13:57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Windy Ridge&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;8:38&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Windy Ridge to Finish&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;19:39&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+1916/-3261&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4:32:14&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;19:33&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;17:50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Overall&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;66.43&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;+12674/-12651&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;19:24:59&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;17:32&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;15:27&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an example, a few weeks ago I did much of the
&lt;a href=&quot;https://www.nps.gov/places/000/bright-angel-trail.htm&quot;&gt;Bright Angel trail&lt;/a&gt;
in the Grand Canyon twice a few days apart, once
hiking and one running. I was only about two minutes/mile slower
hiking and it was much easier. &lt;a href=&quot;https://educatedguesswork.org/posts/bigfoot73/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;It&#39;s very important to address
incipient blisters early, because once a blister gets big
it can make it very hard to move fast. Irunfar has a good &lt;a href=&quot;https://www.irunfar.com/trail-first-aid-blister-prevention-and-care&quot;&gt;guide&lt;/a&gt;
to blister treatment. &lt;a href=&quot;https://educatedguesswork.org/posts/bigfoot73/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This dynamic is just how ultras go; you meet
up with someone who is about your pace and you go together for
a while. It&#39;s good to have someone to talk to and it helps
keep you moving. People will sometimes adjust their pace a bit to
let someone else keep up, but at the end of the day it&#39;s a race, so if
you&#39;re too different, then you end up splitting up. &lt;a href=&quot;https://educatedguesswork.org/posts/bigfoot73/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>What the heck is going on in New York&#39;s election?</title>
		<link href="https://educatedguesswork.org/posts/rcv-nyc/"/>
		<updated>2021-07-01T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/rcv-nyc/</id>
		<content type="html">&lt;p&gt;If you&#39;ve been following the already bizarre NYC mayoral election,
you&#39;ve no doubt heard that the NY Board Of Elections (BOE)
has had to &lt;a href=&quot;https://www.nytimes.com/2021/06/29/nyregion/adams-garcia-wiley-mayor-ranked-choice.html?action=click&amp;amp;module=Spotlight&amp;amp;pgtype=Homepage&quot;&gt;withdraw&lt;/a&gt; their partial tallies because they
accidentally counted some test ballots.
The root of this problem seems to just be simple human error,
but the situation is vastly complicated by NY&#39;s use of what&#39;s
called Ranked Choice Voting (RCV) also called Instant Runoff Voting (IRV).&lt;/p&gt;
&lt;h2 id=&quot;how-it-usually-works%3A-first-past-the-post-and-runoffs&quot;&gt;How it Usually Works: First Past the Post and Runoffs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#how-it-usually-works%3A-first-past-the-post-and-runoffs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Many people tend to think of voting as simple: you vote for your
preferred candidate and whoever gets the most votes wins. This
model, usually called &amp;quot;first past the post&amp;quot;, is certainly common
but by no means universal, and has some obvious problems which
emerge if there are more than two candidates. Consider the case
where we have three candidates, Alice, Bob, and Charlie and 12 voters.
We run the election with the following results:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Alice&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Bob&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Charlie&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;4&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;So, Alice wins, right? But here&#39;s the thing: what if everyone
who preferred Charlie actually preferred Bob to Alice? This
system just ignores that fact and hands the election to Alice.
But if Charlie had dropped out, then Bob would have gotten those
vote and would have won instead of Alice, with a comfortable
margin like so.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Alice&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Bob&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;5&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This situation strikes many people as fundamentally unfair:
If you support a candidate with a low chance of winning
but you also have a preference between the leading candidates,
you have to decide between voting for that candidate or
actually influencing the outcome of the election in the
direction you prefer. It also means that third party candidates
can potentially change the outcome by being in the race
(the disparaging term here is &amp;quot;spoiler&amp;quot;).
It&#39;s not like this can&#39;t happen in the real world, either: in several recent US presidential
elections (1992, 2000, 2016) third party candidates have received
enough votes that it could in principle have changed the outcome.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
In any case, having people who have no chance
of winning not affect the election seems like a desirable
property.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;One way to address this is by having what&#39;s called a
&amp;quot;runoff&amp;quot; election. The general way that a runoff works is
that if no candidate gets more than a given threshold
percentage of the vote then you run a new election with
some of the lower-ranked candidates omitted. A particularly
consequential example of this is that of the Georgia 2020 senate races,
in which you had to get 50% of the vote in order to win.
However, in both the regular election (for a full term)
and the special election&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
(for a four year term), no
candidate got over 50%, so a runoff election was run
three months later with just the top two candidates.
In one of those races (the special election) the first-ranked
candidate in November (Raphael Warnock) eventually won,
but in the other, David Perdue had the most votes
in November but eventually lost to Jon Ossoff in January,
giving the Democrats a 50-50 Senate with VP Kamala Harris
as the tie breaker.&lt;/p&gt;
&lt;h2 id=&quot;instant-runoff-voting-(aka-ranked-choice-voting)&quot;&gt;Instant Runoff Voting (aka Ranked-Choice Voting) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#instant-runoff-voting-(aka-ranked-choice-voting)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Runoff elections have an obvious appeal in that the eventual winner
actually receives a majority of the vote, not just a plurality,
and you can be confident that they actually were the preferred
choice between the two candidates. However, they also have a number
of undesirable properties. First, it&#39;s expensive and inconvenient
to run another election months after the first one. Moreover,
that election is run under different conditions than the first,
so there is time for politicking and you don&#39;t know you&#39;re
getting the same outcome you would have gotten from a runoff
done on election night.&lt;/p&gt;
&lt;p&gt;It&#39;s possible to avoid those costs using Instant Runoff/Ranked Choice
Voting (RCV). The idea behind RCV is to simulate a series of runoff
elections without actually having to run them. To make this work,
instead of listing only their top candidates, voters instead &lt;em&gt;rank&lt;/em&gt;
the candidates on the ballot. A typical version of the election decision procedure
works like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Count up the votes for everyone&#39;s top choice.&lt;/li&gt;
&lt;li&gt;Eliminate the candidate with the lowest number of votes.&lt;/li&gt;
&lt;li&gt;If only one candidate is left, they are the winner, otherwise go to 1.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For instance, suppose we have the following ballots with three candidates.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Voter&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;First&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Second&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Third&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Dave&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Ellen&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Fran&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Greg&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Harold&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In round 1, we count up all the first choices (Alice: 2, Bob 2,
Charlie 1).  So, we have a tie between Alice and Bob with Charlie as
the last place candidate.  We remove Charlie from the election,
changing Harold&#39;s ballot to be &amp;quot;Alice, Bob&amp;quot;, making it a vote for
Alice and giving her the win. In this particular case, the first round
had a tie, but RCV can also change the results. Consider what would
have happened if there were 49 ballots for Alice, 51 for Bob
and 2 for Charlie and then Alice. In a first-past the post system,
Bob would have won, but in an RCV system, Alice wins.&lt;/p&gt;
&lt;p&gt;I just want to note for the moment that there is a lot of debate&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
about whether RCV is actually a good voting system from a political
perspective (i.e., does it produce the &amp;quot;right&amp;quot; outputs?). I&#39;d
just like to bracket that discussion for now, and talk about the
logistical properties in the context of what we&#39;re seeing in New York.&lt;/p&gt;
&lt;h2 id=&quot;rcv-logistics-in-practice&quot;&gt;RCV Logistics in Practice &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#rcv-logistics-in-practice&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The core thing to recognize about RCV is that unlike
first-past-the-post systems the running tallies of the &amp;quot;first choice&amp;quot;
don&#39;t capture the entire state of the tally, and in many cases
don&#39;t do a very good job at all. Consider the case where
even though there are three candidates, voters only have
three sets of preferences (this is unrealistic, but just
convenient for analysis):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;ballots&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;First&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Second&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Third&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;29&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;31&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you just look at the running tallies before the RCV elimination
round, it looks like Alice is way in the lead, but actually
most voters prefer either of Charlie or Bob to Alice, so once
you&#39;ve eliminated Bob, Charlie is going to win with 60% of the
votes.&lt;/p&gt;
&lt;p&gt;A related problem is that relatively small low numbers of
ballots can change the eventual winner even if the gaps
between the leaders is quite large. Consider the election
directly above, but with the people who prefer Bob preferring
Alice to Charlie rather than Charlie to Alice (I&#39;ve bolded
the changed preferences).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;ballots&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;First&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Second&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Third&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;29&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;strong&gt;Alice&lt;/strong&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;strong&gt;Charlie&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;31&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;So, in this current election, Bob is eliminated first, his
votes go to Alice and she wins 69-31. But if we shift 1%
of votes from the third to the first row, giving us:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:right&quot;&gt;ballots&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;First&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Second&lt;/th&gt;
&lt;th style=&quot;text-align:right&quot;&gt;Third&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;40&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;strong&gt;31&lt;/strong&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right&quot;&gt;&lt;strong&gt;29&lt;/strong&gt;&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Charlie&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Bob&lt;/td&gt;
&lt;td style=&quot;text-align:right&quot;&gt;Alice&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In this case, Charlie is eliminated, his votes go to Bob,
and Bob wins 60-40. So, just by moving 2% of votes (2 votes) from
one candidate to another we&#39;ve changed a landslide win
for Alice to a landslide win for Bob.&lt;/p&gt;
&lt;p&gt;The key point here is that in RCV election just looking at the
top-line numbers is super misleading. Instead, you need to think
of the election as consisting of a bunch of different possibilities
depending on who gets eliminated and when. In order to do this,
you need not just the raw tallies for every candidate in each
position, but actually the number of ballots with each possible
ranking of candidates&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
This can be quite a bit of data: as I understand the New York City
election has 13 candidates and you get to pick 5, so that
means that you have over 100,000 different potential slates
that people could have voted for, and you need to see how many
voted each of those got in order to understand the state of the
election.&lt;/p&gt;
&lt;p&gt;So, part of what&#39;s confusing in New York is that you&#39;re seeing
the top-line numbers of how many votes each candidate has
based on the current ballots that have been counted, but there
are a lot of absentee ballots (~125000) that haven&#39;t been counted yet,
and there are still at least three viable candidates
(Adams, Garcia, and Wiley). The gaps between them are very
small: ~15000 between Adams and Garcia after all the elimination
rounds, but only 350 between Garcia and Wiley, so you need to do
a bunch of what-ifs based on what the contents of those absentee
ballots might be &lt;em&gt;and&lt;/em&gt; based on the precise composition of the already
counted ballots. (See this &lt;a href=&quot;https://www.nytimes.com/2021/06/30/nyregion/mayoral-results-vote-count.html?action=click&amp;amp;module=Top%20Stories&amp;amp;pgtype=Homepage&quot;&gt;NYT&lt;/a&gt;
article for more on this). So, it&#39;s not just a simple matter of
saying that Garcia needs 15000 more votes than Adams in order
to win. What if, for instance, Garcia got 15000 more votes than
Adams but Wiley got 500 votes more than Garcia?
In principle, there may even be enough absentee ballots to put
Yang back in the race because he was aout 80,000 ballots behind
Garcia!&lt;/p&gt;
&lt;p&gt;To make matters worse, NYC inadvertantly posted ballot tallies
that included a number of test ballots. Those tallies were
quickly taken down, but it&#39;s obviously another source of confusion.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;take-home&quot;&gt;Take home &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#take-home&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I do want to emphasize at this point that it&#39;s quite possible to run
RCV-based elections efficiently. In fact, Australia routinely runs
a similar system called &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Single_transferable_vote&amp;amp;oldid=1031079755&quot;&gt;single transferrable vote&lt;/a&gt;.
It&#39;s a little more mathematically complicated to do a risk limiting audit with
IRV but there&#39;s now some exciting work showing how to do it &lt;a href=&quot;https://arxiv.org/pdf/2004.00235.pdf&quot;&gt;efficiently in practice&lt;/a&gt;.
What we&#39;re seeing here is the result of
combination of a particularly
contested election, a large number of absentee ballots, the desire to post preliminary
results, and a pretty serious ballot handling error with
the test ballots.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is debate about the impact of third party
candidates in the 1992 and 2016 elections, but
this was certainly something people were worried about at the
time. It seems fairly clear that poor ballot daesign
caused a number of people in Florida to inadvertantly vote for Buchanan rather
than Gore, in numbers large enough to have shifted the election
to Bush. See &lt;a href=&quot;http://sekhon.berkeley.edu/elections/election2000/butterfly.review.pdf&quot;&gt;Wand et al.&lt;/a&gt;
for more. Thanks to Joseph Lorenzo Hall for this reference. &lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In the literature, this is known as &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Independence_of_irrelevant_alternatives&amp;amp;oldid=1031396655&quot;&gt;Independence of Irrelevant Alternatives&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Wikipedia has the &lt;a href=&quot;https://en.wikipedia.org/wiki/2020%E2%80%9321_United_States_Senate_special_election_in_Georgia&amp;amp;oldid=1029629045&quot;&gt;background&lt;/a&gt;
here, but briefly: usually US Senate terms run
6 year and the elections are staggered, but
in this case the sitting senator resigned and
so they had to run a special election to fill
the rest of the term. &lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Keywords for voting nerds:
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Approval_voting&amp;amp;oldid=1030792897&quot;&gt;approval voting&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Tactical_voting&amp;amp;oldid=1030789244&quot;&gt;strategic voting&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Arrow%27s_impossibility_theorem&amp;amp;oldid=1025676548&quot;&gt;Arrow&#39;s theorem&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;People often say that you need a list of
all the ballots, but that&#39;s not actually required. &lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
As an aside, I feel compelled to point out that there is a simpler
way: &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Approval_voting&amp;amp;oldid=1030792897&quot;&gt;approval voting&lt;/a&gt;,
is a simple modification of first-past-the-post in which you
are allowed to vote for multiple candidates and whichever candidate
has the most votes in total wins. This is much simpler to reason
about but at the cost of not letting voters differentiate between
candidates other than between &amp;quot;acceptable&amp;quot; and &amp;quot;not acceptable&amp;quot;.
The debates about approval versus RCV are heated and technical
(see &lt;a href=&quot;https://josephhall.org/misc/yee-approval.pdf&quot;&gt;here&lt;/a&gt; for
an overview), and I won&#39;t get into them here. &lt;a href=&quot;https://educatedguesswork.org/posts/rcv-nyc/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Science&#39;s broken publishing model</title>
		<link href="https://educatedguesswork.org/posts/science-publishing/"/>
		<updated>2021-06-30T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/science-publishing/</id>
		<content type="html">&lt;p&gt;Matt Ridley has an &lt;a href=&quot;https://capx.co/science-journals-wuhan-and-a-truly-bizarre-twitter-episode/&quot;&gt;article&lt;/a&gt;
over at CAPX about how science journals -- in this case
Nature are modifying their coverage to
avoid antagonizing China. Most of the story is about some reporting
by Amy Maxmen on the &amp;quot;lab leak hypothesis&amp;quot; but Ridley also writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One of the subtexts of the debate over the origin of the pandemic
concerns the role of the scientific journals. The magazines that
publish scientific papers have become increasingly dependent on the
fees that Chinese scientists pay to publish in them, plus
advertisements from Chinese firms and subscriptions from Chinese
institutions. In recent years observers have noticed that the news
coverage of China in these magazines has begun to look a little less
objective than it once did.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&#39;m not that interested in the details of Nature&#39;s behavior in this
case, but what Ridley is bringing up goes to some fairly fundamental
issues in scientific publishing.&lt;/p&gt;
&lt;h2 id=&quot;what-are-scientific-journals&quot;&gt;What are Scientific Journals &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-publishing/#what-are-scientific-journals&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For those of you who aren&#39;t familiar with scientific publishing
many of the prestige publication venues are journals.
These range from relatively niche publications you&#39;ve probably
never heard of such as &lt;a href=&quot;https://www.icmp.lviv.ua/journal/about.html&quot;&gt;Condensed Matter Physics&lt;/a&gt;
to top field-specific publications like the &lt;a href=&quot;https://www.nejm.org/&quot;&gt;New England Journal of Medicine&lt;/a&gt;
to top general scientific publications like &lt;a href=&quot;https://www.nature.com/&quot;&gt;Nature&lt;/a&gt;,
&lt;a href=&quot;https://www.sciencemag.org/&quot;&gt;Science&lt;/a&gt;. The top publications like
Science and Nature are really two magazines in one:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A general interest science magazine with articles written by
professional science journalists and targeted for a scientifically
trained but non-specialist audience -- kind of like a high end
version of Scientific American.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A collection of actual scientific papers that are deemed to be
particularly important/worthy/impactful.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Like any magazine, these journals have subscriptions and advertising.
The subscriptions can be quite expensive. For instance an individual
Nature subscription is $199/year but an institutional subscription
(e.g., for a university) is
over &lt;a href=&quot;https://support.nature.com/en/support/solutions/articles/6000211101-institutional-print-subscription-pricing&quot;&gt;$10,000/year&lt;/a&gt;.
Many journals also have what&#39;s called a &amp;quot;page fee&amp;quot; or an &amp;quot;article processing charge&amp;quot;,
where authors pay to publish. An interesting wrinkle here is that
some journals charge more to make your article &amp;quot;open access&amp;quot;. The
way this works is that ordinarily upon submitting to a journal
they would require you to assign your copyright, so that they
are the only ones who can distribute it. However, if you pay
extra (&lt;a href=&quot;https://www.nature.com/nature-portfolio/open-access&quot;&gt;€9500 for Nature&lt;/a&gt;)
you can retain the copyright and publish your paper &amp;quot;open access&amp;quot; in which case it
will be freely redistributable under a generous license (Nature uses the &lt;a href=&quot;https://opendefinition.org/licenses/cc-by/&quot;&gt;CC-BY&lt;/a&gt; license).&lt;/p&gt;
&lt;p&gt;At this point you should be thinking &amp;quot;this all sounds pretty expensive&amp;quot;,
and you&#39;d certainly be right. On the other hand, it&#39;s also quite
prestigious to appear in Science or Nature along with all that other
great research, so maybe it&#39;s worth it. Here&#39;s the thing, though,
what you&#39;re paying for is primarily the right to put &amp;quot;Science&amp;quot;
on your CV. To see why, we&#39;ll need to take a bit of a detour into
how scientific publishing works.&lt;/p&gt;
&lt;h2 id=&quot;the-publication-process&quot;&gt;The publication process &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-publishing/#the-publication-process&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Publication processed vary dramatically between fields, but at a high
level, here&#39;s how journal publication works:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You submit your paper.&lt;/li&gt;
&lt;li&gt;The editor sends it out for review to some reviewers in your field&lt;/li&gt;
&lt;li&gt;Time passes&lt;/li&gt;
&lt;li&gt;The reviewers eventually send back their reviews&lt;/li&gt;
&lt;li&gt;On the basis of those reviews, you are either accepted, rejected, or told to
revise.
&lt;ul&gt;
&lt;li&gt;If you are rejected, you take it somewhere else&lt;/li&gt;
&lt;li&gt;If you are accepted, go to step 6.&lt;/li&gt;
&lt;li&gt;If you are told to revise you go back to step 1.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Usually there is some back and forth with the editor about what exactly you have to change
and then you submit a new manuscript&lt;/li&gt;
&lt;li&gt;More time passes while the journal copy edits your paper, typesets it in their
particular format, etc.&lt;/li&gt;
&lt;li&gt;Eventually they send you page proofs.&lt;/li&gt;
&lt;li&gt;You approve/revise the page proofs and send them back&lt;/li&gt;
&lt;li&gt;The paper is published.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are two important things to note here. First, this all takes a
fantastically long time. I&#39;ve only published in CS journals, but I
remember it being on the order of a year or so. During all this
time, your paper is just kind of sitting there in some liminal state.
These days what really happens is that it&#39;s usually circulating
as a &amp;quot;preprint&amp;quot;. It used to be that people posted these on their
Web sites and tweeted them or whatever,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; but now there are a number
of &amp;quot;preprint&amp;quot; sites such as &lt;a href=&quot;https://arxiv.org/&quot;&gt;arXiv&lt;/a&gt; or
&lt;a href=&quot;https://eprint.iacr.org/&quot;&gt;ePrint&lt;/a&gt; which will just let you distribute
your paper as long as it meets some minimal criteria like being
apparently topical, non-libelous, etc.&lt;/p&gt;
&lt;p&gt;Second, most of the work of reviewing the content of the paper
is being done by the reviewers, i.e., your peers
(hence peer review) who are generally anonymous and uncompensated&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;, although
journal&#39;s editor might be paid (I believe this varies).
But at all the end of this, it&#39;s the &lt;em&gt;journal&lt;/em&gt; who gets
paid. So, what exactly is it that they are being paid for? I&#39;ll
get to that in a moment but first I want to get to an even more
egregious case, which is computer science conferences.&lt;/p&gt;
&lt;h2 id=&quot;cs-publication&quot;&gt;CS Publication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-publishing/#cs-publication&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Computer Science -- and especially security and networking, which is
my area of focus -- does much of its publication at &lt;em&gt;conferences&lt;/em&gt;,
with journals being seen as where you send your &amp;quot;expanded&amp;quot; paper
that was too long to fit into the conference proceedings&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
rather than a primary publication venue.&lt;/p&gt;
&lt;p&gt;Historically the way that conferences work is that there is
&amp;quot;program committee&amp;quot; chaired by a &amp;quot;program chair&amp;quot; (again, all these
people are unpaid faculty members, researchers, and the like;
being on the PC of a good conference looks good on your CV).
They issue a call for submissions with a deadline at some
point in the future.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Once all the submissions are in, each PC member is then assigned
some subset of them to review and on the basis of those reviews
and further discussion (traditionally at a PC meeting, but now
often online) the PC accepts some papers and rejects the rest.
If you&#39;re accepted, you get to present your paper at the conference&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;;
if you&#39;re not, you go submit&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
it somewhere else.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;We&#39;re getting really off topic here, but the dynamics of a PC
meeting are worth spending a little time on to see how
the sausage is made. The conference has
a roughly fixed number of agenda slots
and that tells you how many papers you can have. So, then the
PC needs to pick out the top 20 papers or so out of a stack of
say 150. There are a lot of ways to do this, of course, each
of which has its own problems. A common practice would be to
say &amp;quot;OK, we&#39;re going to reject anything below a given threshold
unless someone wants to advocate for it&amp;quot;. That might get you
down to 2 or 3 times what you need. Then you just have to go
through the papers one at a time, which can be pretty entertaining.&lt;/p&gt;
&lt;p&gt;Say, for instance, you go top down, discussing the highest rated
papers first. In theory these are all easy accepts. At this point,
people are pretty fresh and often want to show how smart they are, so
it&#39;s not too uncommon for one or more of these papers to get torn apart
and if not rejected, then put into the &amp;quot;if space&amp;quot; pile (more on
this in a bit). It&#39;s pretty easy to get fairly far down into the
pile with only a few outright accepts, at which people start
to notice that at this rate you won&#39;t have enough papers and
might get a bit more generous. Lots of times, though, you
get through the whole pile and you still need a lot more papers,
so you start turning to the &amp;quot;if space&amp;quot; pile, which mostly
consists of adequate but imperfect papers (aren&#39;t they all) which someone
doesn&#39;t like for some reason. If there&#39;s room, they&#39;ll often
just get pulled in, and then you end up trying to pick a few
more papers which everyone knows aren&#39;t that great but seem like
the best of the rest. This isn&#39;t the only thing that happens: you can also
-- though rarely in my experience -- have more good papers than
you can accept at which point you have the even more unpleasant
task of rejecting some good papers.
There &lt;em&gt;is&lt;/em&gt; a little bit of slack in the system here, because conferences
often have &amp;quot;invited talks&amp;quot; so if you really need to add one more
paper you can say &amp;quot;we&#39;ll have one less IT&amp;quot; or if, conversely,
there just aren&#39;t any good ones left, you can have some extra ITs.&lt;/p&gt;
&lt;p&gt;Once you&#39;ve been accepted or rejected, you get some time to
submit your &amp;quot;camera ready&amp;quot; version, which is the version that&#39;s
actually going to be published in the &amp;quot;proceedings&amp;quot; (i.e., the
book of papers that the conference distributes, assuming they
distribute one), or just published on the conference Web site.
This is nominally supposed to take into account the reviewer
comments, but as a practical matter once you&#39;re accepted you
can mostly do whatever you want&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
You may have noticed that the publication process is even
more self-serve than in the journal case: you don&#39;t get
any copy-editing or proof-reading and do all your own
typesetting&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.
Indeed, the term &amp;quot;camera ready&amp;quot; comes from the idea that
the proceedings would be produced by photographing your
final copy for reproduction, though of course this is all
now done with PDF. The only part of this process that is
compensated is that the staff who actually run the conference
(rent the hotels, register people, etc.) are paid. But
the program chair and the PC are all volunteers.
But don&#39;t think this necessarily stops the conference
from charging. For
instance, if you go the proceedings for ACM CCS 2020, some
papers &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3372297.3417883&quot;&gt;can be downloaded&lt;/a&gt;
while others &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3372297.3417245&quot;&gt;cannot&lt;/a&gt;.
As far as I can tell, this comes down to whether the authors
paid ACM the article processing charge of $1000 or so to make them free.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;adding-value&quot;&gt;Adding value &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/science-publishing/#adding-value&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As you should have gathered from the above, the scientific work
is mostly done on an unpaid basis (of course, the researchers
and reviewers get paid by their institutions but not by the
journal) but it&#39;s the journal who collects the money and maybe
even charges the researchers to have &lt;em&gt;their own papers&lt;/em&gt; published.
This seems kind of backwards -- after all, book authors earn
royalties -- and it&#39;s not like the actual publication is expensive
because it just goes on a Web site&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
So, why do authors put up with it?&lt;/p&gt;
&lt;p&gt;The simple reason is signaling: an enormous number of papers are
published every year and having one in one of the top venues is
prestigious. And what the publishers have is the name of the prestige
venue: go ahead and publish in a free journal if you want but if you
want to publish in Nature you need to deal with its publisher, Springer-Verlag. And
while we would collectively be better off with a completely open
access system that didn&#39;t shovel piles of money to the publishers,
individually people are a lot better off publishing in the best (i.e.,
most famous) venue they can get into because -- rightly or wrongly --
people use venue as a quick indicator of paper quality,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;
so we&#39;re kind of in an equilibrium that&#39;s hard to get out of
without a lot of collective action. For instance,
&lt;a href=&quot;https://www.cs.rice.edu/~dwallach/&quot;&gt;Dan Wallach&lt;/a&gt; has been talking
for years about &lt;a href=&quot;https://www.cs.rice.edu/~dwallach/pub/reboot-2010-06-14.pdf&quot;&gt;rebooting CS publication&lt;/a&gt; by replacing the whole system with one of open
publication and post-publication rankings. There are some
good ideas here, but they have yet to take off.&lt;/p&gt;
&lt;p&gt;I do see a few reasons for hope here. The first is that there
is increasing pressure for some form of open access within
the traditional publication structure. This comes in a number
of forms, ranging from funding requirements for open
access such as &lt;a href=&quot;https://en.wikipedia.org/wiki/Plan_S&quot;&gt;Plan S&lt;/a&gt;
(though this still allows for article publication fees)
to initiatives such as &lt;a href=&quot;https://www.researchwithoutwalls.org/&quot;&gt;Research Without Walls&lt;/a&gt; in which reviewers commit not to review for non open access venues.
Moreover, as more and more publication moves to the electronic
media, charging large amounts of money for access to those
publications becomes extremely hard to justify (though Nature tries &lt;a href=&quot;https://www.nature.com/articles/s41592-021-01073-y&quot;&gt;here&lt;/a&gt;)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The second is the rise of direct-to-Web publication either by preprint
services like arXiv, ePrint or &lt;a href=&quot;https://www.nber.org/&quot;&gt;NBER&lt;/a&gt; or just
by Twitter. The major rationale for this practice is getting
work out fast, but increasingly it&#39;s just how people disseminate
their work, with most of the impact happening before you even know
if the paper has been accepted anywhere. I don&#39;t see this trend
reversing, and once people have their work out there, charging
for the version that happened to be accepted at the conference looks
increasingly silly&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt;, and the publishers will need to adapt somehow.
Of course, this all comes at a cost: these papers haven&#39;t been
peer reviewed and while one of the functions of the journal/conference
review process is to determine if the paper is one of the better
papers submitted, it also serves as a partial check on whether it&#39;s right&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fn15&quot; id=&quot;fnref15&quot;&gt;[15]&lt;/a&gt;&lt;/sup&gt;
(note that many papers are right but not exciting). We&#39;ll eventually
need some way to address that issue, but I expect it&#39;s going we&#39;re
going to go a bit further down the path of a preprint free-for-all
before the situation becomes so untenable that that actually happens.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Or, if it was a really cool result, especially an attack on
something, you&#39;d invent a cutesy name and logo and have
your own Web site just for the result. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
So why do people do this work? It&#39;s seen as a service contribution,
but at least in some cases, the fact that someone is a reviewer
in general is public even if the papers they reviewed were not,
and so it&#39;s a career milestone to be invited to review. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Why too long, you say? Most conferences have page limits, even
when the proceedings are entirely online. Many are the hours
that my collaborators and I have spent messing with LaTeX
source to get our paper under the page limit. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
It&#39;s fairly traditional to extend the deadline if they
aren&#39;t getting enough submissions, people feel like
they running late, or just because. It&#39;s a particular
mixture of relief and burning rage to have the submission
deadline extended by a week
48 hours before the original deadline. On the one hand,
you have more time; on the other, you&#39;ve spent the
past 72 hours cramming for no reason. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Some conferences have now gone to &amp;quot;rolling submission&amp;quot; where you
can submit multiple times a year, but everything is presented
at the end of the year. That at least lets you get an answer faster. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This is all a bit of a random process; I&#39;ve seen at least one
paper rejected at one conference and go on to win Best Paper at
another, equally good conference. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are 4-5 big computer security conferences a year:
&lt;a href=&quot;https://www.internetsociety.org/events/ndss/&quot;&gt;ISOC NDSS&lt;/a&gt;,
&lt;a href=&quot;https://www.usenix.org/conference/usenixsecurity21&quot;&gt;USENIX Security&lt;/a&gt;,
&lt;a href=&quot;https://dl.acm.org/conference/ccs&quot;&gt;ACM CCS&lt;/a&gt;,
&lt;a href=&quot;https://www.ieee-security.org/TC/SP2021/&quot;&gt;IEEE S&amp;amp;P&lt;/a&gt; (often
called &amp;quot;Oakland&amp;quot; because that&#39;s where it historically took
place, but now it is in San Jose) and arguably
&lt;a href=&quot;https://www.ieee-security.org/TC/EuroSP2021/&quot;&gt;Euro S&amp;amp;P&lt;/a&gt;
and then a giant pile of lower prestige conferences.
The usual practice is to submit to one of the big
conferences and if it gets rejected then you try
another, but eventually you give up and go to a smaller
conference, or a workshop. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The exception here is that sometimes papers will be
&amp;quot;shepherded&amp;quot; which means that the PC thought that the
paper was only acceptable with some specific changes
and has delegated a PC member to make sure you make
them. In this case, if you don&#39;t satisfy the shepherd
your paper won&#39;t appear. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;CS people pretty much all use &lt;a href=&quot;https://www.latex-project.org/&quot;&gt;LaTeX&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
ACM&#39;s system is particularly goofy here, because they
allow you to do &lt;a href=&quot;https://www.acm.org/publications/openaccess&quot;&gt;self-archiving&lt;/a&gt;
which means you post it on your Web site or on arXiv, but it&#39;s
not free on their site.
They also have some system in which your ACM conference
proceedings can link to the ACM&#39;s site and if people go
there from your site, then it will be free, but otherwise
they&#39;ll get charged.
True story: this works by using the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer&quot;&gt;Referer&lt;/a&gt;[sic] header, but
&lt;a href=&quot;https://developers.google.com/web/updates/2020/07/referrer-policy-new-chrome-default&quot;&gt;Firefox&lt;/a&gt; and
&lt;a href=&quot;https://developers.google.com/web/updates/2020/07/referrer-policy-new-chrome-default&quot;&gt;Chrome&lt;/a&gt; recently changed the default for the Referer header,
which caused this to break for some conferences, such as &lt;a href=&quot;https://irtf.org/anrw/2020/&quot;&gt;ANRW&lt;/a&gt;.   The whole system seems to be designed to be nominally
open access but in practice to make it harder for people to
actually find the free version. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In fact, much of the complexity of the site seems to go
to controlling access to the papers so you can charge for them. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;And
of course because &amp;quot;selectivity&amp;quot; (the fraction of papers
that get accepted) is used as a proxy for conference quality,
this is a self-perpetuating process. &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;See also, &lt;a href=&quot;https://en.wikipedia.org/wiki/Sci-Hub&quot;&gt;Sci-Hub&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;And nobody, AFAIK, requires you to withdraw
your preprint &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn15&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Though see &lt;a href=&quot;https://en.wikipedia.org/wiki/Replication_crisis&quot;&gt;replication crisis&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/science-publishing/#fnref15&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>What&#39;s in California&#39;s Vaccine Passport?</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport-ca/"/>
		<updated>2021-06-23T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport-ca/</id>
		<content type="html">&lt;p&gt;Last week, California rolled out their new
&lt;a href=&quot;https://myvaccinerecord.cdph.ca.gov/&quot;&gt;digital COVID Vaccine Record&lt;/a&gt;
(aka vaccine passport). This credential is based on the Vaccine
Credentials Initiative &lt;a href=&quot;https://vci.org/about#smart-health&quot;&gt;SMART Health Cards Framework&lt;/a&gt;.
They provide a fairly complete
&lt;a href=&quot;https://spec.smarthealth.cards/&quot;&gt;specification&lt;/a&gt; as well as &lt;a href=&quot;https://github.com/smart-on-fhir/health-cards/tree/main/generate-examples&quot;&gt;sample code&lt;/a&gt;,
so it&#39;s pretty easy to figure out what&#39;s in here.&lt;/p&gt;
&lt;p&gt;At a high level, the credential is a digitally signed value formatted
as a &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7519&quot;&gt;JSON Web Token&lt;/a&gt;
and then encoded into a QR code.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
A JWT consists of three pieces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A header value, containing some meta-information&lt;/li&gt;
&lt;li&gt;The payload to be signed&lt;/li&gt;
&lt;li&gt;The signature&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can mostly ignore the header and the signature, because what
matters here is the payload. I go through this in some detail below
but you don&#39;t need to wade through that
to get the high points. If you ignore the &lt;a href=&quot;https://www.w3.org/TR/vc-data-model/&quot;&gt;Verifiable Credentials&lt;/a&gt;
machinery, this credential
contains three major pieces of information:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The patient&#39;s identity (name and date of birth)&lt;/li&gt;
&lt;li&gt;The various immunization events, consisting of
&lt;ul&gt;
&lt;li&gt;The vaccine type (I think, see below)&lt;/li&gt;
&lt;li&gt;The lot number&lt;/li&gt;
&lt;li&gt;Where it was performed&lt;/li&gt;
&lt;li&gt;The date of injection&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This seems fairly sensible and is really all you need in a system
like this. Arguably, it&#39;s more than you need: people don&#39;t
need to know where you were vaccinate, the lot number
or arguably even vaccine type in
order to know that you were vaccinated (given that some
vaccines appear to be more effective than others, I could
imagine in principle wanting to know the vaccine type).
Here, we have to distinguish here between what you might want to have
for your own records and what others might be entitled to know about you.
For instance, if it turned out that there was a bad vaccine lot,
then you might want your health care provider to be able to
determine that and revaccinate you, but it&#39;s not necessary
for someone to know that to let you into a bar. I can imagine
a number of technical approaches to addressing the desire
for different levels of access, but realistically it&#39;s
not clear that any of this information is that sensitive either
(though you&#39;ll note that I&#39;ve redacted it below).&lt;/p&gt;
&lt;p&gt;This is all more or less as &lt;a href=&quot;https://www.sfgate.com/politics/article/Gavin-Newsom-vaccine-passports-California-COVID-19-16246673.php&quot;&gt;advertised by California
Governor Gavin Newsom&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;It’s not a passport, it&#39;s not a requirement, it&#39;s just the ability now to have an electronic version of that paper version, so you&#39;ll hear more about that in the next couple of days,&amp;quot; he said.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I sympathize with the desire not to call it a &amp;quot;passport&amp;quot; but this is effectively
what a &amp;quot;vaccine passport&amp;quot; has come to mean: a verifiable electronic record of vaccination.&lt;/p&gt;
&lt;p&gt;This brings me to the topic of &amp;quot;verifiable&amp;quot;. This credential is digitally
signed by a key which appears to belong to the State of California
Department of Public health (by which I mean it&#39;s hosted on their
Web Site). However, what I don&#39;t see is how you actually read &lt;em&gt;or&lt;/em&gt; verify
it. I just wrote some quick code to pull it apart (you don&#39;t even
need to verify it to do that, though of course in real life you have to)
but the idea here is that you&#39;re supposed to have some kind of mobile
app on your smartphone and you just point it at the credential and
it will tell you the contents and whether it&#39;s valid. However, after
some digging around, I didn&#39;t find a recommended app to use to validate
it, which kind of seems to really diminish the usefulness of the
system.&lt;/p&gt;
&lt;p&gt;Obviously, it&#39;s possible to write your own app to verify these credentials
(the sample code provided by VCI gets you pretty close), but that&#39;s also
clearly unreasonable to ask people to do. Moreover, as I
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/&quot;&gt;mentioned earlier&lt;/a&gt;,
an important part of such an app is embedding the trusted credential
issuers, which, for obvious reasons, is information you need to get externally, not from
the credential itself.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;Hard Hat Area Below&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The important part of this object is the payload, which is a JSON structure that has been compressed with zlib and then base64 encoded.
First we have the outer
wrapper:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string-property property&quot;&gt;&quot;iss&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://myvaccinerecord.cdph.ca.gov/creds&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string-property property&quot;&gt;&quot;nbf&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token string-property property&quot;&gt;&quot;vc&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token string&quot;&gt;&quot;https://smarthealth.cards#health-card&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token string&quot;&gt;&quot;https://smarthealth.cards#immunization&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token string&quot;&gt;&quot;https://smarthealth.cards#covid19&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt;  &lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This section contains three things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The &amp;quot;issuer&amp;quot; of the credential, in this case the California
Department of Public Health. This URL also tells you where
to get the public key to use to verify the credential
(it&#39;s at &lt;a href=&quot;https://myvaccinerecord.cdph.ca.gov/creds/.well-known/jwks.json&quot;&gt;https://myvaccinerecord.cdph.ca.gov/creds/.well-known/jwks.json&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &amp;quot;not before&amp;quot; date (&lt;code&gt;nbf&lt;/code&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &amp;quot;vc&amp;quot; structure which represents the rest of the data, which
in this case is a W3C &lt;a href=&quot;https://www.w3.org/TR/vc-data-model/&quot;&gt;Verifiable Credentials&lt;/a&gt;
object. The values inside tell us it&#39;s a COVID-19 immunization
record.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that there&#39;s already something a bit inconvenient here in that
you need the issuer URL in order to get its keys, but in order to get
that you need to (1) decompress and parse the payload without
verifying the signature (2) retrieve the keys (3) verify the
signature. This is a bit clunky but once you know that it&#39;s not that
bad.&lt;/p&gt;
&lt;p&gt;Inside that container, you have a bunch of other containers:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;    &lt;span class=&quot;token string-property property&quot;&gt;&quot;credentialSubject&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token string-property property&quot;&gt;&quot;fhirVersion&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;4.0.1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token string-property property&quot;&gt;&quot;fhirBundle&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;resourceType&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Bundle&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;collection&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token string-property property&quot;&gt;&quot;entry&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What&#39;s going on here is that this credential is being built on two
frameworks:
This is all machinery from &lt;a href=&quot;https://www.w3.org/TR/vc-data-model/&quot;&gt;Verifiable Credentials&lt;/a&gt;
and &lt;a href=&quot;https://www.fhir.org/&quot;&gt;Fast Health Interoperability Resources&lt;/a&gt;, each of which
is fairly generic, so you end up pulling in a lot of machinery that we&#39;re not really
making use of. Obviously, you could put a lot of different things in these containers,
but in this case all there is is this list of entries, which contains everything
else. First, we have:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;          &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;fullUrl&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;resource:0&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;resource&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;resourceType&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Patient&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token string-property property&quot;&gt;&quot;family&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;RESCORLA&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token string-property property&quot;&gt;&quot;given&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;                    &lt;span class=&quot;token string&quot;&gt;&quot;ERIC&quot;&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;birthDate&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;...&quot;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is obviously just my name and birthdate the latter of which I&#39;ve removed,
replacing it with &lt;code&gt;...&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Then we have two records, indicating that I&#39;ve been immunized, how, and where:&lt;/p&gt;
&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;          &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;fullUrl&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;resource:1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;resource&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;resourceType&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Immunization&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;status&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;completed&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;vaccineCode&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token string-property property&quot;&gt;&quot;coding&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;system&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://hl7.org/fhir/sid/cvx&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;code&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;208&quot;&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;occurrenceDateTime&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;...&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;performer&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token string-property property&quot;&gt;&quot;actor&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;display&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Santa Clara County Mass Vaccination Site4 (Levi S)&quot;&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;lotNumber&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;...&quot;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;fullUrl&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;resource:2&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token string-property property&quot;&gt;&quot;resource&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;resourceType&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Immunization&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;status&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;completed&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;vaccineCode&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token string-property property&quot;&gt;&quot;coding&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;system&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://hl7.org/fhir/sid/cvx&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;code&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;208&quot;&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;occurrenceDateTime&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;...&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;performer&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token string-property property&quot;&gt;&quot;actor&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;br /&gt;                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;display&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Santa Clara County Mass Vaccination Site4 (Levi S)&quot;&lt;/span&gt;&lt;br /&gt;                  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;br /&gt;              &lt;span class=&quot;token string-property property&quot;&gt;&quot;lotNumber&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;...&quot;&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;br /&gt;          &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Everything here is pretty straightforward, except for the &amp;quot;system&amp;quot; stuff, which
describes the actual vaccine I was given. You can find the table of what it means
&lt;a href=&quot;https://build.fhir.org/ig/dvci/vaccine-credential-ig/branches/main/ValueSet-vaccine-product-cvx.html&quot;&gt;here&lt;/a&gt;.
Code &lt;code&gt;208&lt;/code&gt; refers to &amp;quot;SARS-COV-2 (COVID-19) vaccine, mRNA, spike protein, LNP, preservative free, 30 mcg/0.3mL dose&amp;quot;.
You&#39;ll notice that it doesn&#39;t say Pfizer, but you can infer it from the type (mRNA) and dosage (.3mL) because
Moderna has a &lt;a href=&quot;https://www.fda.gov/media/144637/download&quot;&gt;.5mL dose&lt;/a&gt; whereas Pfizer is &lt;a href=&quot;https://www.fda.gov/media/144413/download&quot;&gt;.3mL&lt;/a&gt;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The QR encoding is kind of... interesting: there&#39;s a string
prefix starting with &lt;code&gt;shc:/&lt;/code&gt; that indicates how may
chunks there are followed by the actual
bytes encoded as two digit decimal numbers that represent
the byte value minus 45 (to get the entire range of values
into the range [0...99]). Of course, the JWT itself is
base64-encoded, which is a bit goofy. If you were starting
from scratch, you could obviously get a more compact encoding,
but that&#39;s what happens when you build on existing standards. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ca/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>So you want to watch people run</title>
		<link href="https://educatedguesswork.org/posts/running-video/"/>
		<updated>2021-06-19T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/running-video/</id>
		<content type="html">&lt;p&gt;I&#39;ll be the first to admit it, running is boring, especially when it&#39;s
ultramarathons. What&#39;s more interesting, however, especially if you&#39;re
a runner, and maybe if you&#39;re not, is watching really good people run.
Thanks in part to GoPros and YouTube, there&#39;s now an enormous amount
of relatively high quality running film, ranging from just condensed
race footage to well-produced quasi-documentaries. The pleasure here
is mostly just watching amazing athletes doing their thing, but
filmmakers have sort of figured out how to capture the cool bits and
filter out the 5-15 hours of people just covering mile after mile. Most of
the stuff below is trail running which seems to translate better, but
there still is some great road running footage.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Warning: there are some spoilers here. Some of the fun is watching the
race for yourself, in which case, well, skip over the rest.&lt;/p&gt;
&lt;h2 id=&quot;unbreakable-(youtube)&quot;&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=zy1as6CTYXI&quot;&gt;Unbreakable&lt;/a&gt; (YouTube) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-video/#unbreakable-(youtube)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Probably the best overall trail running film, documenting the 2010
&lt;a href=&quot;https://www.wser.org/&quot;&gt;Western States 100&lt;/a&gt;, by far the most
prestigious American ultra. That year had an incredibly stacked men&#39;s
field and Unbreakable focuses on
the top four contenders, returning champion Hal Koerner, Anton Krupicka, Geoff Roes, and a
22 year old Killian Jornet before he became such a dominant figure in ultrarunning.
Unbreakable really sets the template for future ultra films, intercutting background
on the four, the history of Western States
(feel free to skip over the parts with
Gordy Ainsleigh&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;), footage of the race itself, and
post-race interviews with the athletes.&lt;/p&gt;
&lt;p&gt;Unbreakable is really a great example of how ultras are different
from shorter races. About halfway through Roes starts to fade in
the heat, Jornet and Krupicka taking the lead (Koerner had fallen
back earlier and eventually dropped out with an injury).
Jornet and Krupicka eventually put almost 20 minutes on Roes.
Jornet fades badly at mile 20, leaving only Krupicka and
Roes, with Roes eventually running Krupicka down around mile 90
and going on to finish in course record time. This is something
you don&#39;t see a lot in a road race, where it&#39;s much less common
to recover from a bad patch. I think part of it is just
that over a longer event anything can happen, but also the
greater variety in terrain and pace leaves a lot more room
to recover from a bad position -- or to fall apart.&lt;/p&gt;
&lt;p&gt;Don&#39;t miss: Killian just tearing past Krupicka on a single
track descent early in the race; really gives you a sense
of how he would go on to become the best mountain runner
in the world.&lt;/p&gt;
&lt;h2 id=&quot;life-in-a-day-(youtube)&quot;&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=kYgcTJBLwsU&quot;&gt;Life in a Day&lt;/a&gt; (YouTube) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-video/#life-in-a-day-(youtube)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The womens counterpart to Unbreakable, this time documenting the
2016 women&#39;s race, focusing on Magda Boulet, Ann Mae Flynn, Kaci Lickteig,
and Devon Yanko. Follows pretty much the same template of
racer bios mixed with race footage. At the time, Boulet, Lickteig,
and Yanko were known quantities but Flynn was a relative newcomer,
earning entry to Western by finishing third at the difficult
&lt;a href=&quot;https://www.lakesonoma50.com/history--results.html&quot;&gt;Lake Sonoma 50&lt;/a&gt;
(behind Kaci Lickteig and YiOu Wang). This is a solid film, with
a bunch of backstory on some great athletes, but
there&#39;s less drama because Lickteig takes the lead pretty early
and never gives it up.&lt;/p&gt;
&lt;p&gt;Don&#39;t miss: Jim Walmsley running by about 35 seconds in, en
route to his famous detour at mile 90, where he was so far
in the lead he went off course and ended up finishing 20th.&lt;/p&gt;
&lt;h2 id=&quot;miller-vs.-hawks%3A-tnf-endurance-challenge-50-2016-(youtube)&quot;&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=7DCR03UDggA&quot;&gt;Miller vs. Hawks: TNF Endurance Challenge 50 2016&lt;/a&gt; (YouTube) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-video/#miller-vs.-hawks%3A-tnf-endurance-challenge-50-2016-(youtube)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;What it says on the tin: covers the duel between returning
champion Zach Miller
and 50 mile first-timer Hayden Hawks at the now defunct North Face 50 miler
in the Marin Headlands. Notable principally for how amazingly
fast they&#39;re pushing from the gun. Of special interest here
for Norcal runners because these are trails you can run --
and race -- on regularly, and it&#39;s amazing to see how much
faster the pros can go. A lot of this is filmed on what
looks like a GoPro (by, I suspect, well-known ultrarunner
&lt;a href=&quot;https://twitter.com/JamilCoury&quot;&gt;Jamil Coury&lt;/a&gt;), so
you really get the runner&#39;s perspective. Both of these guys
are still racing hard today (Hawks just set the course
record at JFK 50 2020), so you&#39;re what you&#39;re seeing here is
two of the current top male stars).&lt;/p&gt;
&lt;p&gt;Don&#39;t miss: Miller huffing and puffing up Tennessee Valley at what
looks for all the world like half marathon effort.&lt;/p&gt;
&lt;h2 id=&quot;golden-trail-series-2019-and-2020-(youtube)&quot;&gt;Golden Trail Series 2019 and 2020 (YouTube) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-video/#golden-trail-series-2019-and-2020-(youtube)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Golden Trail is a mostly European race series consisting of
comparatively shorter mountain runs like &lt;a href=&quot;https://www.sierre-zinal.com/en/homepage.html&quot;&gt;Sierre-Zinal&lt;/a&gt;,
&lt;a href=&quot;https://www.marathonmontblanc.fr/en/&quot;&gt;Marathon du Mont-Blanc&lt;/a&gt; and
&lt;a href=&quot;https://www.pikespeakmarathon.org/&quot;&gt;Pike&#39;s Peak Marathon&lt;/a&gt;. It draws
some of the best mountain runners in the world, including
Killian Jornet, Maude Mathys, Remi Bonnet, etc.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Aside: people often lump all the longish distance trail races
into &amp;quot;Mountain Ultra Trail&amp;quot; and it&#39;s certainly a lot of
the same people, but just watching some European races
gives you a sense of the variation here. Most North
American races, especially on the West Coast&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; are on
comparatively easy terrain, whether single
track or fire roads, with a lot of the difficulty
coming from being really long with
large amounts of climbing, high elevation,
or both. European races are often shorter -- though there
are plenty of long distance races such as UTMB -- with much more difficult
footing (the term here is &amp;quot;technical&amp;quot;) due to rocks,
roots, etc. and frequently include really steep descents,
sections that are poorly marked, not really trail, etc.
This is an opportunity to see the best European runners
including a number you don&#39;t see in US ultras.&lt;/p&gt;
&lt;p&gt;In 2020, due to COVID, they turned Golden Trail into a
stage race, with four races, one each day. This is a totally
different challenge from one long day or a stage race
and you can see really the difficulty of trying
to race day after day on extraordinary tricky terrain,
including a number of seriously muddy and steep descents.
The 2019 and 2020 races are both available on YouTube,
as well as the first race of 2021.&lt;/p&gt;
&lt;p&gt;Don&#39;t miss:
Tove Alexandersson just tear down this near vertical
mud slope in stage 2 at about 16:00. Also, the Salomon coach saying
&amp;quot;If you are all together at the bottom of the climb,
then Jim [Walmsley] will arrive alone at the top.&amp;quot;&lt;/p&gt;
&lt;h2 id=&quot;the-barkley-marathons%3A-the-race-that-eats-its-young-(amazon-prime)&quot;&gt;&lt;a href=&quot;https://www.amazon.com/Barkley-Marathons-Race-That-Young/dp/B017Y43P3S/ref=sr_1_1?dchild=1&amp;amp;keywords=barkley&amp;amp;qid=1613620903&amp;amp;sr=8-1&quot;&gt;The Barkley Marathons: The Race That Eats Its Young&lt;/a&gt; (Amazon Prime) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-video/#the-barkley-marathons%3A-the-race-that-eats-its-young-(amazon-prime)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Perhaps the most accessible watching here is this documentary about
the notoriously difficult and inaccessible (not to mention, misnamed)
&amp;quot;Barkley Marathons&amp;quot;.
Designed by the Race Director, &amp;quot;Lazarus Lake&amp;quot; (real name, Gary Cantrell)&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
to be basically at the limit
of human endurance, Barkley is a 100+ mile &amp;quot;course&amp;quot; with a 60 hour limit that
still has approximately a 1% finish rate (this is partly due to Lake
regularly making it harder). It&#39;s barely a trail race:
unmarked with lots of difficult cross country travel and with GPS explicitly
forbidden.&lt;/p&gt;
&lt;p&gt;Unlike normal events, Barkley is full of small details designed to
mess with the runners: the start time isn&#39;t announced beyond a 12
hour window, with Lazarus just giving an hour warning; you demonstrate
that you&#39;ve run the course by taking pages out of books at various
checkpoints along the course (the pages being dictated by your race
number); runners don&#39;t even get maps, but instead are required to copy
the course off of Lake&#39;s master map, etc.&lt;/p&gt;
&lt;p&gt;Unlike much of the stuff here, this was clearly filmed for a general
audience and is more a documentary than race footage -- though there&#39;s
plenty of that -- but more an exploration of what would make someone
want to do something like this. Kind of like Free Solo but without
the feeling that you&#39;re encouraging someone to risk their life.&lt;/p&gt;
&lt;p&gt;Don&#39;t miss: John Fegyveresi running under the prison.&lt;/p&gt;
&lt;h2 id=&quot;breaking-2-(disney%2B)&quot;&gt;Breaking 2 (Disney+) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-video/#breaking-2-(disney%2B)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the opposite end of the spectrum from the chaos of Barkley is
is this slickly produced documentary about Nike&#39;s attempt on the sub two hour
marathon. Everything about this is carefully calibrated, from
the new Nike shoes (prototype Vaporflys) to the pacing and drafting
strategy, the fueling, etc. and it leans a bit hard on the idea that
what really matters is the sports science and the technology (after
all, it&#39;s basically a commercial for Nike) but at the end of
the day you still get a sense of what an amazing athlete Eliud
Kipchoge is, and even though he doesn&#39;t quite make it, finishing
in 2:00:25, it&#39;s an unbelievable performance. Two years later,
Kipchoge would in fact run 1:59:40.2 in Vienna.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
As I&#39;ve written about &lt;a href=&quot;https://educatedguesswork.org/educatedguesswork.org/posts/pacing/&quot;&gt;earlier&lt;/a&gt;,
this isn&#39;t a world record because of the pacing and fueling strategy,
but that doesn&#39;t diminish the experience of watching someone clock
out mile after mile at a pace most of us can barely do flat out.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Don&#39;t miss: Essentially every minute Kipchoge is on screen.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Even I won&#39;t watch triathlon, though. &lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The history here is actually quite cool. Western States is run on the &lt;a href=&quot;https://en.wikipedia.org/wiki/Tevis_Cup&quot;&gt;Tevis Cup&lt;/a&gt;
horse race course, but one year Gordy Ainsleigh decided to run it on foot. I just didn&#39;t find
him telling the story that interesting. &lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
If you watch these races, it&#39;s truly amazing how many
of the best European mountain runners are sponsored
by Salomon. In the recent
&lt;a href=&quot;https://www.goldentrailseries.com/races/olla-de-nuria-v2/#&quot;&gt;Olla de Nuria&lt;/a&gt;,
the first three men and first two women were all Salomon
sponsored &lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;With
some notable exceptions like Barkley, see below &lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Also responsible for the sadistic &amp;quot;Backyard Ultra&amp;quot;, a last-man
standing style event in which runners have to do a 4ish mile
loop every hour, with the runners all starting together at the
same time and the winner being the last person to give up
(who still has to run the last lap on their own). This format
guarantees that you can&#39;t get much rest: even if you run the
loop comparatively fast in say, 30 minutes, you just have to
start again in 30. &lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You can find video of that &lt;a href=&quot;https://www.ineos159challenge.com/&quot;&gt;here&lt;/a&gt;;
while less accessible it really focuses on Kipchoge rather than
the shoes. &lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Seriously. Check out this &lt;a href=&quot;https://www.youtube.com/watch?v=SRYtn0j5ccA&quot;&gt;video&lt;/a&gt;
of people trying to run Kipchoge&#39;s world record pace of 2:01:39 on a treadmill
at the Chicago Marathon expo. &lt;a href=&quot;https://educatedguesswork.org/posts/running-video/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Notes on supershoes</title>
		<link href="https://educatedguesswork.org/posts/supershoes/"/>
		<updated>2021-06-12T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/supershoes/</id>
		<content type="html">&lt;p&gt;One of the attractive aspects of running as a sport is that it
seems fair: the fastest person wins, not the person with the
fastest shoes, the fastest car, or the best tennis racket.
Now, this was never entirely true as shoe weight absolutely
makes a difference and so runners have picked lightweight
shoes to race in for years, but there were lots of lightweight
race shoes and it probably didn&#39;t matter much which brand
you bought.&lt;/p&gt;
&lt;h2 id=&quot;enter-the-supershoe&quot;&gt;Enter the Supershoe &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supershoes/#enter-the-supershoe&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;That all changed in in 2017 when Nike brought out a shoe called
the Vaporfly. The Vaporfly included a bunch of elements that
had appeared in previous shoes but put together in a new way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A full-length carbon-fiber plate in the midsole&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;A high energy return (i.e., bouncy) midsole made of
&lt;a href=&quot;https://www.pebaxpowered.com/en/pebax-technology/&quot;&gt;pebax foam&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A really thick midsole (the industry jargon term here is
&amp;quot;stack height&amp;quot;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As Alex Hutchinson points out in in a worth-reading a &lt;a href=&quot;https://www.outsideonline.com/2408971/nike-vaporfly-controversy&quot;&gt;Outside online article&lt;/a&gt;,
these elements all preexisted, so you might have thought
it wouldn&#39;t matter. In particular, &lt;a href=&quot;https://www.hokaoneone.com/&quot;&gt;Hoka One One&lt;/a&gt;
famously make super high stack height shoes. Hutchinson imagines
the following conversation between Nike and the &lt;a href=&quot;https://www.worldathletics.org/&quot;&gt;International Association of Athletics Federations (IAAF)&lt;/a&gt; (now known as &amp;quot;World Athletics&amp;quot;) who makes
the rules about what equipment is legal (read the whole thing):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;IAAF: Okay, then I think we’re good. Your three “innovations” sound
like you’re just rehashing ideas that have been used in running shoes
without controversy for years, if not decades.&lt;/p&gt;
&lt;p&gt;Nike: But here’s the thing: we’ve got the mix just right. These shoes
are way better than any previous shoe. They’ll improve your running
economy by four percent. They’re going to annihilate every world
record in the book!&lt;/p&gt;
&lt;p&gt;IAAF: [sound of prolonged laughter in the background, then
speakerphone is switched off and the laughter is disguised as a cough]
Of course, of course. They sound wonderful. Put it all in the press
release, I’m sure it’ll be a big hit. No worries on our end.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But here&#39;s the thing: Nike was right. There have been a number of
studies here, but the TL;DR is that the Vaporfly works. For instance,
in 2019, &lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1080/02640414.2019.1633837&quot;&gt;Hunter et al.&lt;/a&gt;
found that runners in the Vaporfly 4% used 2.8% and 1.9% less oxygen
than those in the the Nike Zoom Streak and the Adidas Adios Boost
respectively.
Oxygen use isn&#39;t perfectly correlated with speed, but the NYT&#39;s
very extensive &lt;a href=&quot;https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html&quot;&gt;analysis&lt;/a&gt;
of race data from Strava found that the Vaporfly is about 4% faster than the average
shoe. It&#39;s hard to overstate what a big deal this is:
4% isn&#39;t going to
turn me into Kenenisa Bekele, but it&#39;s roughly the difference between
1st (Eliud Kipchoge) and 14th (Stephen Kiprotich) in the men&#39;s marathon at the Rio Olympics.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
To make matters worse, Nike has kept improving their shoes, following up with the
&lt;a href=&quot;https://www.nike.com/running/vaporfly&quot;&gt;Vaporfly Next%&lt;/a&gt;, and the ridiculous
looking but apparently quite effective &lt;a href=&quot;https://www.nike.com/running/alphafly&quot;&gt;Alphafly&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/si0Ui-3sjWu1NiSOhxNf7VviORecsxzn467w0iGzSLQxAv015mVOnzEb-ahz4L9ZQQzlJuGQIaME0sA3VChIuuxVlG-e4rRPJx67Khrb7bzLI6M2OvlUHYiTL2JMqGVyme8uCSLz&quot; alt=&quot;Nike Alphafly&quot; /&gt;&lt;/p&gt;
&lt;p&gt;[Image from &lt;a href=&quot;https://www.roadtrailrun.com/2020/03/nike-zoom-alphafly-next-initial-review.html&quot;&gt;RoadTrailRun&#39;s review&lt;/a&gt;]&lt;/p&gt;
&lt;h2 id=&quot;the-iaaf%2Fwaf-response&quot;&gt;The IAAF/WAF Response &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supershoes/#the-iaaf%2Fwaf-response&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Pretty much as soon as it became clear that the Vaporflys really were better,
there started to be criticism (the pejorative phrase here is
&amp;quot;technological doping&amp;quot;). It seems to me that this kind
of misses the point. The rules about what you can and can&#39;t
do in a given sport are generally pretty arbitrary and
historically contingent. For instance,
If everyone just ran in rubber-soled shoes and someone
suggested you put little metal nails on the bottom of
your shoe to get better traction, that would sound
like cheating, but of course &lt;a href=&quot;https://en.wikipedia.org/wiki/Track_spikes&quot;&gt;spikes&lt;/a&gt;
are ubiquitous in track and cross-country.
Or take &lt;a href=&quot;https://en.wikipedia.org/wiki/Racewalking&quot;&gt;race walking&lt;/a&gt;,
which requires you to have one foot on the ground at all times.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
Obviously, if you just start running in such an event you&#39;re
cheating, but nobody thinks you&#39;re cheating if you run in a 10K.
For this reason. I don&#39;t think it&#39;s that helpful to frame this
question in moral terms; what&#39;s important
is having a common set of rules that allow for competition,
and the introduction of the Vaporfly disrupted that equilibrium.&lt;/p&gt;
&lt;p&gt;For a while Nike had an enormous lead here and if you wanted to be
fast you really wanted to wear the Vaporfly, but inevitably
other shoe companies started rolling out their own
carbon-plated high-stack supershoes, so now we&#39;ve got an arms race
on our hands. In an attempt to address this, WAF issued
&lt;a href=&quot;https://www.worldathletics.org/news/press-releases/modified-rules-shoes&quot;&gt;new rules&lt;/a&gt;
in April 2020, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A maximum stack height of 40mm&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;Only a single plate&lt;/li&gt;
&lt;li&gt;The shoes must have been available on the open market for four months&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This last requirement was &lt;a href=&quot;https://www.worldathletics.org/news/press-releases/amendment-to-development-shoe-rules-in-international-competitions&quot;&gt;removed&lt;/a&gt;
in December.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; As far as
I can tell, at this point you can more or less wear whatever prototype
shoe you want as long as they meet the technical conformance requirements.&lt;/p&gt;
&lt;p&gt;While there is plenty of evidence that supershoes are faster.
It&#39;s actually &lt;a href=&quot;https://www.outsideonline.com/2367961/how-do-nikes-vaporfly-4-shoes-actually-work&quot;&gt;somewhat&lt;/a&gt; &lt;a href=&quot;https://link.springer.com/article/10.1007/s40279-020-01406-5&quot;&gt;unclear&lt;/a&gt; exactly why. There are a variety
of theories (it&#39;s the stack height, it&#39;s the new foam,
it&#39;s the &lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1080/19424280.2020.1734870&quot;&gt;stiff carbon plate&lt;/a&gt;, &lt;a href=&quot;https://www.nature.com/articles/s41598-020-74097-7&quot;&gt;no&lt;/a&gt; &lt;a href=&quot;https://osf.io/preprints/sportrxiv/37uzr/&quot;&gt;it&#39;s not&lt;/a&gt;,
it&#39;s the thick foam but you need the
carbon plate to keep it stable). Regardless, the existence of shoes
which significantly improve performance raises some obvious
issues.&lt;/p&gt;
&lt;h2 id=&quot;fairness&quot;&gt;Fairness &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supershoes/#fairness&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For amateurs, the question of fairness is mostly
about whether they can buy the shoes (or how much they cost),
For professionals, however, fairness is less about access to shoes
than it is about sponsorship. One of the main ways in which
professional runners make money is by shoe sponsorships.
Understandably, the company wants their athletes to wear their
products. A good example here is
the 2020 US Olympic Marathon trials. When these were
held, the Vaporfly Next% was already widely available,
but this didn&#39;t help you if you ran for a brand that didn&#39;t
have a plated shoe yet, because understandably your sponsor
didn&#39;t want you wearing Nikes, even if that made you faster,
because having you win in the competitor&#39;s shoe doesn&#39;t help them
much.
&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
As described in this article in &lt;a href=&quot;https://www.runnersworld.com/gear/a31180532/olympic-marathon-trials-shoe-count/&quot;&gt;Runner&#39;s World&lt;/a&gt;, a number
of runners dealt with this by wearing Nikes in disguise,
as below (the point here isn&#39;t to fool anybody, but just
to avoid showing the competition&#39;s logo):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/black-vaporfly-paint-1583120074.jpg?crop=0.656xw:0.984xh;0.0691xw,0.0155xh&amp;amp;resize=768:*&quot; alt=&quot;Black Vaporfly&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Note that the (now removed) four month availability rule doesn&#39;t
actually help here that much because the athletes are tied
to a single manufacturer. If athletes were free to choose their
own shoes, then they could just buy the best shoe on the
open market&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;,
but if they&#39;re required to wear their sponsor&#39;s
shoe&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;
and if one manufacturer (in this case Nike&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;)
has a
technological lead, then it doesn&#39;t help that their shoe
is widely available. On the other hand, it&#39;s not clear it
helps to have the rule removed either, because now everyone
is just wearing prototype shoes that are four months (or whatever)
newer.&lt;/p&gt;
&lt;p&gt;The hope seems to be that there are some natural limits to
how much better this type of shoe can get. If that&#39;s true, then
eventually the difference between successive generations
of shoes will become relatively narrow and it won&#39;t be
much of an advantage to have the absolutely latest shoe.
The restriction on the stack height is presumably intended
to help produce this result.&lt;/p&gt;
&lt;p&gt;One potential complication here is that it there seems to
be a fair amount of variation in how much
advantage people get from the new shoes. For instance,
(at least according to the abstract, which is all I have)
&lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1080/19424280.2015.1130754&quot;&gt;Madden et al.&lt;/a&gt; report that some runners in a stiffer shoe
(&amp;quot;responders&amp;quot;) showed a 2.9% increase in running economy whereas
others (&amp;quot;non-responders&amp;quot;) showed a 1% decrease. This suggests
a potential new source of unfairness in which
some runners get the benefit of the new shoes and
others do not.
Even more interestingly, it seems like different runners
do better with &lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1080/19424280.2020.1734870&quot;&gt;different levels of stiffness&lt;/a&gt;, so you might have
a situation in which two runners could in principle
benefit equally from plated shoes with appropriate stiffness,
but because of what&#39;s actually available one gets an advantage
over the other. This potentially gives an
advantage to really elite runners, whose sponsors will
make shoes more or less to their specifications.
This happens already to some extent -- for instance, Salomon S/Lab has built shoes &lt;a href=&quot;https://www.roadtrailrun.com/2018/02/inside-salomon-hq-and-slab-everything.html&quot;&gt;specifically&lt;/a&gt; for Kilian Jornet and Francois D&#39;Haene -- but
there&#39;s a difference between a shoe that fits your feet perfectly
and one that makes you 2% faster.&lt;/p&gt;
&lt;h2 id=&quot;what&#39;s-next%3F&quot;&gt;What&#39;s next? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supershoes/#what&#39;s-next%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With any luck, we&#39;ll find out that there&#39;s a practical limit on
how much better shoes can be made with this technology. If that&#39;s
true, then every manufacturer will start making one and it
won&#39;t matter what brand you&#39;re attached to. If not, though, we
may be in for a rough few years. Oh, yeah, before I forget:
while all the shoes I&#39;ve been talking about were for road running,
Nike has a new &lt;a href=&quot;https://news.nike.com/footwear/air-zoom-victory&quot;&gt;carbon-plated spike&lt;/a&gt;
and the North face just recently rolled out a
&lt;a href=&quot;https://www.thenorthface.com/shop/mens-flight-vectiv-nf0a4t3l&quot;&gt;carbon-plated trail shoe&lt;/a&gt;,
so I doubt we&#39;re through with this just yet.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For those of you who aren&#39;t shoe nerds, your typical running
shoe has a &amp;quot;midsole&amp;quot; made of some sort of softish foam and
an &amp;quot;outsole&amp;quot; made of grippy, longer-wearing rubber. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Incidentally,
Jared Ward, who was a coauthor on the above paper finished 6th in that race. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
One of my favorite examples here is the &lt;a href=&quot;https://en.wikipedia.org/wiki/Butterfly_stroke&quot;&gt;butterfly&lt;/a&gt;
swimming stroke, which was originally used (though with the breastroke
kick) as a faster version of breaststroke, but is so much better
that they made it its own style. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Perhaps not coincidentally, the Alphafly is &lt;em&gt;just&lt;/em&gt; under 40mm. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This really only applies to elite/professional competition;
it&#39;s not like you&#39;re going to go to jail if you use them. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Nike actually gave out free Alphaflys to anyone
who had qualified. In a particularly gutsy move --
everyone knows you should never race in a shoe
you&#39;ve never trained in -- &lt;a href=&quot;https://en.wikipedia.org/wiki/Jake_Riley_(runner)&quot;&gt;Jake Riley&lt;/a&gt;
took a pair and ran his way into second place and onto the Olympic team. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
One could argue that the problem here is the sponsorship
model we use for compensating athletes, but that seems unlikely to change
any time soon. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Of course, in some cases companies may decide they&#39;d
rather have their athletes win in the wrong shoes than
lose in the right shoes.
For instance, per RW Mizuno let Matt McDonald wear Nike
shoes and I seem to remember that On Running let their
athletes do so before they had a carbon-plated shoe. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Probably the closest analog here is Speedo&#39;s
&lt;a href=&quot;https://en.wikipedia.org/wiki/LZR_Racer&quot;&gt;LZR Racer suit&lt;/a&gt;
which was ultimately severely restricted. &lt;a href=&quot;https://educatedguesswork.org/posts/supershoes/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Some Confusion in New York&#39;s Vaccine Passport Rollout</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport-nyc/"/>
		<updated>2021-06-03T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport-nyc/</id>
		<content type="html">&lt;p&gt;June 1st&#39;s NYT has an &lt;a href=&quot;https://www.nytimes.com/2021/06/01/nyregion/excelsior-pass-vaccine.html&quot;&gt;article&lt;/a&gt;
about the state of NYT&#39;s &lt;a href=&quot;https://covid19vaccine.health.ny.gov/excelsior-pass&quot;&gt;Excelsior Pass&lt;/a&gt; vaccine passport&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
which reveals that people have
some weird ideas about the system and
how it needs to be used. First, we have:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It took Albert Fox Cahn, executive director of the Surveillance
Technology Oversight Project, a nonprofit watchdog group, just &lt;a href=&quot;https://www.thedailybeast.com/i-forged-new-yorks-digital-vaccine-passport-in-11-minutes-flat&quot;&gt;11
minutes to download someone else’s Excelsior Pass&lt;/a&gt; using information
they had posted on social media and Google searches, he said. Many
people have posted pictures of their vaccination cards, which
include a person’s name, birthday, date of vaccination and type of
shot.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Cahn writes, in an article called &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/(https://www.thedailybeast.com/i-forged-new-yorks-digital-vaccine-passport-in-11-minutes-flat)&quot;&gt;&amp;quot;I Forged New York’s Digital Vaccine Passport in 11 Minutes Flat&amp;quot;&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;But beyond the civil liberties and equity concerns, there’s a much
more fundamental critique: The technology doesn’t work. The entire
justification for an electronic vaccine tracker is that it’s
supposedly “secure.” But while the CDC’s flimsy “white cards”
provide few protections against forgery, are the high-tech apps much
better? That’s what I set out to find on Easter Sunday. I set aside
the entire day for the experiment, but I was done before
breakfast. After getting consent from an Excelsior Pass user, I
tried to download their pass, logging into their account using
nothing more than public information from social media. Eleven
minutes after he gave me the greenlight, I had a copy of his blue
Excelsior Pass in hand, valid for use until September.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Now, this is not an ideal set of properties, but for &lt;em&gt;privacy&lt;/em&gt;,
not &lt;em&gt;security&lt;/em&gt; reasons. As I described &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/&quot;&gt;previously&lt;/a&gt;,
a vaccine credential system like this isn&#39;t a bearer token:
it&#39;s a signed assertion binding the user&#39;s identity to
a given vaccine status. That&#39;s why the user
also has to present some sort of biometric identification
like a driver&#39;s license to show that they&#39;re the person
described in the credential. This means that you having a copy
of my vaccine passport doesn&#39;t let you pretend to be
vaccinated unless you&#39;re also able to get some biometric
ID for yourself but my name on it (or, I suppose, if we have the same name).
This is pretty much the way things have to work because
you&#39;ll be showing your credential to people all the time in
order to demonstrate that you&#39;re vaccinated; if they could
just make a copy and use that, the whole system would fall
apart the minute someone who wanted to cheat got a copy of
any valid credential, as they could distribute it all over
the Internet. So, it doesn&#39;t really make sense to say that this
is &amp;quot;forging&amp;quot; the credential.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;As a comparison point, consider a &amp;quot;vaccine passport&amp;quot; system
in which we just had a giant online database of who was vaccinated
and who wasn&#39;t (effectively, the purpose of the signature
on the credential is to passivate database entries so they
can be verified offline.)
When someone wants to know if you&#39;re vaccinated,
you give them your name and they just look it up and check
against your ID. We wouldn&#39;t say that someone had &amp;quot;forged&amp;quot;
your vaccine passport in that case if they were able to retrieve
your record, it&#39;s just the system
working as designed.&lt;/p&gt;
&lt;p&gt;What we would say, however, is that this system has a privacy
problem: I
can also use that information to determine whether &lt;em&gt;anyone&lt;/em&gt; --
not just the person in front of me --
was vaccinated or not, and, depending on exactly what&#39;s
in the credential, when and with what vaccine.
However, if you can retrieve people&#39;s credentials
with public information, then even an offline credentials
system has the same problem, which seems to be the
situation here.
What you want is that only the vaccinated person can get
their own credential, though this is may not be as
easy to implement as it sounds.
If you issue credentials at vaccination time, it&#39;s
pretty straightforward, but if you want to issue them
to people who have already been vaccinated, it&#39;s harder.
Here&#39;s what the article says is required:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I.B.M. recently added a phone number check to the identification
field of the app to make it easier to find someone’s
vaccination. Only four of the five fields — including first and last
name, date of birth and ZIP code — need to match for someone to get
a pass.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Unfortunately, nearly all of this information is semi-public.  There
are plenty of people for whom I know their full name, zip code, and
phone number; if that&#39;s all that&#39;s required, the privacy situation is
not good. Probably the best approach is to send patients a copy of the
credential -- or a code to retrieve it -- to the phone or email
address they used to register for their appointment (or even physically
mail them a piece of paper with the QR code on it.) I don&#39;t really
have a good solution for poeple for whom you don&#39;t have good contact
information though.&lt;/p&gt;
&lt;p&gt;The article goes on to say that people are treating the
QR code itself as it it were proof of vaccination:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;And each pass can be uploaded to a limitless number of devices, or
printed out and copied. The Excelsior Pass, which cost the state
$2.5 million to develop, contains no biometric data for privacy
reasons, so it needs to be compared against an ID, an extra step
that, in practice, sometimes isn’t taken.&lt;/p&gt;
&lt;p&gt;...&lt;/p&gt;
&lt;p&gt;At the City Winery on Wednesday, outdoor hosts sometimes asked for
ID when people flashed their Excelsior Pass or paper vaccination
cards to gain entry, but sometimes they didn’t. At the Armory, Covid
compliance officers in face shields carefully checked IDs, but they
just eyeballed the pass’s QR code, instead of scanning it to
double-check its veracity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To the extent to which this practice is common, it&#39;s actually
a fairly serious problem. Just checking to see if people have a QR code of some
kind on their phone doesn&#39;t do anything: one QR code looks
much like another and without scanning it, you can&#39;t tell if it
even has the right name on it, yet alone if it actually
describes someone&#39;s vaccination status, has a digital
signature from someone you trust, etc. If verifiers just
glance at the code without scanning it,
it doesn&#39;t matter whether the system is properly designed, cryptographically
secure, etc., because anyone who wants to pretend
to be vaccinated can just download a random QR code
of the right size off the Internet and pretend it&#39;s their
vaccine credential.&lt;/p&gt;
&lt;p&gt;This isn&#39;t to say that there&#39;s no value in a system like this. First, many verifiers
will actually check the QR codes. Second,
just as with the paper cards, it&#39;s some effort
to forge even a bogus QR code
and a lot of people just won&#39;t be comfortable
with effectively lying about their vaccine status. But to
the extent to which that&#39;s true, you don&#39;t need any fancy
crypto, just have people show a photo of their vaccine card.
In any case, it seems clear that if we are going to have
this kind of system more education is needed in order to
prevent misunderstandings like these.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Recall that this is powered by IBM&#39;s Digital Health Pass. See
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/&quot;&gt;here&lt;/a&gt; for more on that.
 &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
While we&#39;re on the topic, California only seems to
want to give you a new driver&#39;s license if yours
is lost or stolen, but why can&#39;t I just get two
copies of the same license so that I can have one
in my wallet and one in my car? It&#39;s got my
picture on it, so it&#39;s not usable by someone else. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-nyc/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>The tech behind EG</title>
		<link href="https://educatedguesswork.org/posts/blog-tech/"/>
		<updated>2021-06-01T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/blog-tech/</id>
		<content type="html">&lt;p&gt;At this point there are a fair number of options in how to set up a
blog.  You can do &lt;a href=&quot;https://blogger.com/&quot;&gt;Blogger&lt;/a&gt;,
&lt;a href=&quot;https://substack.com/&quot;&gt;Substack&lt;/a&gt;, &lt;a href=&quot;https://wordpress.com/&quot;&gt;Wordpress&lt;/a&gt;
etc. If you want to self-host there are a lot of options too. A lot of
tech people use what&#39;s called a &amp;quot;static site generator&amp;quot;, which means
that instead of having some piece of software like Wordpress that runs
on the site and you enter your posts into, you just write your posts
in a text editor, use the generator to build the HTML for the site,
and then upload it to the server.&lt;/p&gt;
&lt;p&gt;The specific SSG I use is &lt;a href=&quot;https://www.11ty.dev/&quot;&gt;Eleventy (11ty)&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blog-tech/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
which
I use with &lt;a href=&quot;https://docs.netlify.com/configure-builds/common-configurations/eleventy/&quot;&gt;Netlify&lt;/a&gt; and Github. The way this works is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;I use the &lt;a href=&quot;https://github.com/11ty/eleventy-base-blog&quot;&gt;11ty base blog template&lt;/a&gt;,
which has the basic setup for an 11th-based blog. I&#39;ve customized it some,
mostly to have my preferred style, as well as to have the archives in the
sidebar.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I write&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/blog-tech/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
posts locally in &lt;a href=&quot;https://daringfireball.net/projects/markdown/&quot;&gt;Markdown&lt;/a&gt;.
Eleventy has a local server that automatically builds the site whenever the
source changes. I work in a Github branch so that I can have multiple
posts going at once and also so that I can ask people for review
via Github PRs on the repo.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When I&#39;m done, I merge the Github branch and push that to the main repo.
Netlify automatically detects this, builds the site, and publishes it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I use &lt;a href=&quot;https://www.cloudflare.com/web-analytics/&quot;&gt;Cloudflare Web Analytics&lt;/a&gt;
to monitor traffic.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One point about using Netlify for this application: They ask
for a more expansive set of Github permissions than I was
willing to give, so I created a new Github account just for the
blog and gave Netlify permissions on that. Then I made my regular
Github account a collaborator. This also let me fork the main
repo into my own account and let people review PRs there without
having to give them access to the main repo.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Thanks to &lt;a href=&quot;https://tantek.com/&quot;&gt;Tantek Çelik&lt;/a&gt; for the recommendation. &lt;a href=&quot;https://educatedguesswork.org/posts/blog-tech/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Using Emacs, naturally &lt;a href=&quot;https://educatedguesswork.org/posts/blog-tech/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Blockchains/Ledgers and Vaccine Passports</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport-ledger/"/>
		<updated>2021-05-29T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport-ledger/</id>
		<content type="html">&lt;p&gt;Via &lt;a href=&quot;https://twitter.com/gareth_t_davies&quot;&gt;Gareth T. Davies&lt;/a&gt; I see that
IBM has posted a &lt;a href=&quot;https://eprint.iacr.org/2021/704&quot;&gt;whitepaper&lt;/a&gt; on
their &amp;quot;IBM Digital Health Pass&amp;quot; system on &lt;a href=&quot;https://eprint.iacr.org/&quot;&gt;ePrint&lt;/a&gt;.
It&#39;s a white paper not a complete specification so some of the details
are kind of sketchy, but at a high level
it&#39;s similar to the kind of
design I &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/&quot;&gt;talked about&lt;/a&gt;
and that used by the &lt;a href=&quot;https://vci.org/&quot;&gt;Vaccine Credentials Initiative (VCI)&lt;/a&gt;:
a digitally signed credential with the user&#39;s health status (vaccination or testing status).
Unlike the VCI system -- at least as being deployed in their pilot -- the
signing keys are contained in &lt;a href=&quot;https://en.wikipedia.org/wiki/X.509#Certificates&quot;&gt;X.509 certificates&lt;/a&gt;
which chain up to a set of trust anchors stored in a &amp;quot;Trusted Registry&amp;quot;,
which is implemented via a distributed ledger, specifically
&lt;a href=&quot;https://www.hyperledger.org/&quot;&gt;Hyperledger&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;the-trusted-registry&quot;&gt;The Trusted Registry &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#the-trusted-registry&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Here&#39;s the description of the Trusted Registry:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Trusted Registry is an entity controlled by (one or more)
Administration Authorities. Its objective is to maintain and provide
upon request public metadata of authorised Issuers, as well as Health
Certificate revocation information that are crucial for the secure
verification of Health Certificates. The Trusted Registry implements
the following functionalities upon properly authorised requests:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This seems like the wrong architecture for a system like this as
an online service of this type is neither necessary
nor desirable: Once the trust anchors have authorized the issuers
(i.e., given them a certificate) there&#39;s no need the trust anchors to do much
of anything [Yes, I know about revocation; see below).
In particular, they don&#39;t need to regularly be part of the
verification process, because the credentials they supply to the
issuers (i.e., certificates) are self-contained and can be verified
without contacting the trust anchors at all. The same thing is true
for the credentials issued by the issuers: they too are self-contained
(as long as they contain they issuer&#39;s certificate). A verifier can
just take the entire credential and verify it offline. This has
obvious advantages both for the verifier -- they don&#39;t need to
be online -- and the operator of the system -- they don&#39;t need
to offer high availability.&lt;/p&gt;
&lt;h2 id=&quot;revocation&quot;&gt;Revocation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#revocation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s true that in a system like this, revocation of issued
credentials may have to be done
online. On the other hand, it&#39;s not clear how much we need
revocation: the stakes of falsely accepting a single misissued vaccine
credential are quite low, comparable to letting someone with
a fake ID drink at a bar, and we just accept that risk all the time. It&#39;s worth
noting that physical credentials such as driver&#39;s licenses
are largely unrevocable as far as ordinary people are concerned,
and yet we happily accept them as identification in all sorts
of contexts.&lt;/p&gt;
&lt;p&gt;Even if we decide we do need vaccine credential revocation,
it&#39;s not clear why you need some trusted registry involved.
In a conventional PKI system, the entity which issued
the certificate (in this case the Issuer) is responsible
for publishing revocation information. This makes sense
because they are the ones who generated it. Of course,
it might be inconvenient for them to actually host the servers
which distribute that information; it&#39;s not uncommon
for WebPKI CAs to use content distribution networks to distribute
their certificate status (OCSP) information, but it&#39;s important to realize
that the OCSP responses are signed by the CA and so you
don&#39;t need to trust the CDN. Similarly, the system described
here could have some sort of revocation information distribution service,
but it&#39;s just a convenience and doesn&#39;t have to be trusted.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There is one case where we probably do need some sort of revocation:
if a &lt;em&gt;trust anchor&lt;/em&gt; or &lt;em&gt;issuer&lt;/em&gt; key is compromised it can be used to
issue a lot of fake credentials. However, this can be easily handled
by some sort of centralized update system like &lt;a href=&quot;https://blog.mozilla.org/security/2015/03/03/revoking-intermediate-certificates-introducing-onecrl/&quot;&gt;OneCRL&lt;/a&gt; that just updates the apps. This kind of large-scale
revocation happens quite infrequently, so again there&#39;s no need for a high
availability service; it&#39;s probably easier for each app just to
update itself, as browsers do now.&lt;/p&gt;
&lt;h2 id=&quot;what-are-we-trusting-the-registry-for%3F&quot;&gt;What are we trusting the registry for? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#what-are-we-trusting-the-registry-for%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The general argument offered by the white paper is that implementing
the Trusted Registry via a ledger reduces the risk of compromise/misbehavior:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;More specifically, IDHP security relies on the assumption that the
trusted registry will update issuer metadata and health
certificate revocation lists according to authorised issuer and
administration authorities requests. Moreover, IDHP security
provisions rely on the fact that the trusted registry will respond
to verifier queries correctly, i.e., in accordance to the content of
the registry.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;First, as I said above, you don&#39;t generally need a service in the
first place because the certificates are self-contained. Second,
because the objects in the system (certificates, CRLs, etc.) are
signed, the registry doesn&#39;t need to be trusted to respond to
verifier queries &amp;quot;correctly&amp;quot;. It can choose not to respond at all,
thus creating a denial of service condition, but if it responds with
false information, then that information will be rejected by
the verifier because the signature doesn&#39;t validate.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Clearly, one could implement the trusted registry as a robust, but
centrally controlled service. This would still position the trusted
registry as a single point of failure for IDHP: if the single
entity that controls the registry is compromised, then the answers
of the trusted registry to verifier queries can no longer be
trusted, and the system would no longer be able to accurately
assess health certificate validity.&lt;/p&gt;
&lt;p&gt;The IDHP Trusted Registry is implemented using permissioned
Distributed Ledger Technology(DLT), as we wanted our registry to
provide resistance to authority missbehavior/compromise by shifting
the functional responsibility to the system’s stakeholders, while
at the same time ensuring that these stakeholders would enjoy
full control over the system’s governance (adding new members,
upgrade functionality, etc.). Figure 4 demonstrates IDHP
interactions with a decentralised Trusted Registry.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To be honest, I&#39;m having a lot of trouble making sense of this argument.
Forgetting about the technology, let&#39;s just look what the trust relationships
are here.&lt;/p&gt;
&lt;p&gt;At the end of the day, the verifier has to trust some entity or
set of trust anchors (what the IDHP calls them &amp;quot;Administration Authorities&amp;quot;)
to authorize some
other entities (e.g., clinics) to issue vaccine credentials. The
white paper is vague on who the Administration Authorities are, but
it seems likely that they are governmental or quasi-governmental
because we want everyone in a given jurisdiction to have the
same set of trust anchors and thus trust the same set of issuers.
Within a given jurisdiction, the trust anchor(s) are essentially
authoritative for who is a valid issuer.
For example, suppose that California ran a vaccine passport system
with the &lt;a href=&quot;https://www.cdph.ca.gov/&quot;&gt;California Department of Public
Health (CDPH)&lt;/a&gt; actually operating the trust
anchor. They would be responsible for authorizing Walgreens, CVS,
etc. to actually issue people&#39;s credentials.&lt;/p&gt;
&lt;p&gt;So at one level, it&#39;s trivially true that authorities who misbehave
are a threat to the system. If the CDPH decides to
authorize the Educated Guesswork COVID Clinic to issue vaccine
credentials even though actually we don&#39;t actually give out any
shots. But there&#39;s this is very difficult to stop technologically
because the actors in the system generally don&#39;t know
that I&#39;m not operating a real COVID clinic:
after all, the state &lt;em&gt;could&lt;/em&gt; have set one up at my house, they just
didn&#39;t and have decided to lie about it. It&#39;s true that there
are other stakeholders in the ecosystem (e.g., the State of Oregon)
but their opinions aren&#39;t relevant as to whether California
has authorized me to run a clinic.
The point here is that the trust relationships
here are inherently centralized and trying to map them onto
a decentralized technology doesn&#39;t change that.&lt;/p&gt;
&lt;h2 id=&quot;the-usefulness-of-a-ledger&quot;&gt;The usefulness of a ledger &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#the-usefulness-of-a-ledger&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even if we &lt;em&gt;are&lt;/em&gt; concerned about this kind of misbehavior and want
to stop it, it&#39;s not really clear that putting the data
in a ledger helps much: the primary service that a ledger provides
is &amp;quot;consensus&amp;quot;, ensuring that everyone agrees on a specific set of
data, but that&#39;s not really that important here because CDPH&#39;s
opinion is the only one that matters. All you need to know in order
to know whether to accept a credential from a given issuer is
that the state says that that issuer is valid.&lt;/p&gt;
&lt;p&gt;There is one semi-useful property of some kind of ledger-type
structure, but it&#39;s not really that applicable here: if every piece of
information published by the trust anchors is public, then it in
theory allows for third parties to detect cheating. For instance, one
could download the list of all of the valid issuers and look to see if
any of them looked fishy (e.g., &amp;quot;I live in this neighborhood and
there is no clinic here&amp;quot;, or &amp;quot;I went to the clinic and they&#39;re
injecting people with &lt;a href=&quot;https://www.tailwindnutrition.com/&quot;&gt;Tailwind&lt;/a&gt;
rather than COVID vaccine). But the risk of this seems comparatively
low and it&#39;s not clear how you&#39;d have a scalable way of detecting these
cases, as there&#39;s no canonical public list of real clinics.&lt;/p&gt;
&lt;p&gt;The WebPKI uses a similar system called &lt;a href=&quot;https://en.wikipedia.org/wiki/Certificate_Transparency&quot;&gt;Certificate
Transparency&lt;/a&gt;,
to detect fraudulent issuance of WebPKI certificates, but it
seems like this is of fairly modest benefit
in this case, for several reasons: first, unlike the WebPKI,
there aren&#39;t going to be that many issuers and the procedure
for registering them with the health authorities is going
to involve a fair amount of official
paperwork even before we get to the vaccine passport piece of
the equation -- after all, it&#39;s not like anyone can order
up a couple boxes of vaccine and start giving shots -- so
this makes it easier to have a secure procedure for authorizing
them. In particular, you can have the actual trust anchor
signing key offline, making compromise much less likely.
Second, a transparency scheme isn&#39;t helpful for detecting
misissuance of the patient credentials themselves: for obvious reasons
you don&#39;t want to publish the names of people who have been certified
as vaccinated.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The bottom line here is that this seems like a far more complicated
design than is necessary. It&#39;s straightforward to build a PKI-based
system that doesn&#39;t require the verifiers to contact any online
service. Even if you did want to have an online service, as
in the VCI prototype, then it&#39;s not clear what a ledger adds in
terms of security or resistance to misbehavior. To repeat what
I said above: what ledgers primarily provide is consensus, but
as far as I can tell, this system doesn&#39;t needs that and
so it&#39;s just a bunch of added complexity
(more generally, &amp;quot;does this require consensus&amp;quot; is is one of the questions you should ask yourself
whenever someone proposes using a blockchain/ledger for something).&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Revocation is a complicated topic, but at a high level there
are really three major designs (1) Have some service that lets verifiers ask about the revocation
status of a credential (in the WebPKI, this is &lt;a href=&quot;https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol&quot;&gt;OCSP&lt;/a&gt;).
(2) Periodically publish a list of all the revoked credentials
to verifiers (e.g., &lt;a href=&quot;https://en.wikipedia.org/wiki/Certificate_revocation_list&quot;&gt;CRLs&lt;/a&gt;
or &lt;a href=&quot;https://obj.umiacs.umd.edu/papers_for_stories/crlite_oakland17.pdf&quot;&gt;CRLite&lt;/a&gt;).
(3) Give the verifiers some sort of short-lived credential which
is periodically refreshed, as with &lt;a href=&quot;https://en.wikipedia.org/wiki/OCSP_stapling&quot;&gt;OCSP stapling&lt;/a&gt;.
This last option is impractical for vaccine passports, because you
want to issue them once and then let patients print them out,
so they cannot be updated. Either of the other two options is
potentially practical. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is one special case in which the service replies
with stale information, such as an OCSP response which
indicates that a certificate is valid when it is has
since been revoked, but this is generally a very small
risk in a case like this. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note that one purpose of CT is to detect that
someone else has had a certificate issued in your name, but
that doesn&#39;t really apply here, because (1) personal names
aren&#39;t unique, unlike domain names and (2) it&#39;s not clear
how it harms you if there is a vaccine credential in your name
that was issued to someone else. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-ledger/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Against streaming apps</title>
		<link href="https://educatedguesswork.org/posts/streaming-apps/"/>
		<updated>2021-05-18T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/streaming-apps/</id>
		<content type="html">&lt;p&gt;So, we wanted to subscribe to HBO Max to watch some stuff.
Simple enough, go to the HBO Max Web site, make an account,
give them your money, etc. Except that I have an LG TV
and it turns out that HBO Max doesn&#39;t have an app for
WebOS, apparently because they have some &lt;a href=&quot;https://www.reddit.com/r/HBOMAX/comments/iw6upf/any_official_reason_why_hbo_max_isnt_on_lg_smart/&quot;&gt;exclusive deal with Samsung&lt;/a&gt;.
No problem, then, you can watch HBO Max through Hulu, so
I sign up through Hulu,&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/streaming-apps/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and was happily watching stuff, until we discovered that
even though HBO Max has all the Studio Ghibli films,
they&#39;re not available through Hulu, but
&lt;a href=&quot;https://www.hulu.com/hbomax&quot;&gt;only through the HBO Max App&lt;/a&gt;.
At the end of the day I had to haul out my Fire TV and download
the HBO Max app onto that. Which would be fine, I guess, except
that the app just crashes randomly while I&#39;m watching stuff,
which is less than fine.&lt;/p&gt;
&lt;p&gt;It&#39;s easy to blame HBO or LG -- and it is kind of an annoying
state of affairs -- but the basic problem is deeper:
every streaming service has its own app, its own login, it&#39;s own library of titles,
and its own subtly different UI. So, you&#39;re constantly having
to context switch and ask yourself &amp;quot;Was that video on Netflix or
Amazon Prime? Or Maybe it&#39;s Hulu? Oh, it&#39;s on both? Then which one
was I watching it on?&amp;quot; And what is the gesture to bring up the
subtitles?&lt;/p&gt;
&lt;p&gt;This is a dumb state of affairs, and one that didn&#39;t have to happen.
As a counterexample, you can read your email with any email client;
youou don&#39;t need to use Facebook Browser to read Facebook; and you
don&#39;t need a different app on your iPhone to call people on AT&amp;amp;T than
to call people on T-Mobile. It&#39;s just that we&#39;ve gotten
used to it because so much content is locked up in vertically
integrated silos where you have to have the app to view the content
(thanks mobile!). This isn&#39;t any kind of technical problem: you could obviously
have a generic client that was able to stream media from any provider:
Amazon Prime and Hulu already do this with add-on providers
like HBO, Starz, etc; it&#39;s just that neither app covers
every streaming provider and (as noted above) they don&#39;t always
have the whole catalog.&lt;/p&gt;
&lt;p&gt;I don&#39;t really have a conclusion here: there are powerful economic
incentives for content providers to want to lock stuff up in their
own silos, but let&#39;s not pretend it&#39;s the way it has to be.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
HBO gracefully refunded my first month subscription because
I hadn&#39;t used it. &lt;a href=&quot;https://educatedguesswork.org/posts/streaming-apps/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Improving vaccine registration</title>
		<link href="https://educatedguesswork.org/posts/vaccine-registration/"/>
		<updated>2021-05-17T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-registration/</id>
		<content type="html">&lt;p&gt;Here in the United States we&#39;ve rapidly gone from a situation where
there was overwhelming demand for the COVID vaccine to one where
supply far outstrips demand and the major concern is how to get people
to take it. However, until late April and early May, there was a huge
amount of contention for vaccination appointments. I think it&#39;s clear
that this process did not go as smoothly as possible. In particular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Vaccines were unevenly distributed, with areas where there was extra
supply not too far away from areas where there was high demand. For
instance, people here in the Bay Area were driving an hour or two to
Stockton or Tracy to get vaccinated before they could here.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Every time a new tranche of people became eligible (e.g., 50 or
above, 16 or above), there would be a mad rush for appointments,
with the Web sites taking time to be updated, people having trouble
registering, etc.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Networking people will recognize this as as basically a queueing
problem: we have a fixed amount of capacity and demand that exceeds
that capacity, so we have to find a way of scheduling the rate
at which things happen. It&#39;s important to to recognize that in
a situation like this, some people are going to have to wait.
There&#39;s really nothing to be done about that other than make
more capacity or wait for demand to subside. Our scheduling objective is to ensure an efficient process.
Specifically, this means:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Make sure you&#39;re using all of your available capacity. In this
case that means that you&#39;re giving out doses as fast as possible
and that none go to waste.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Serve the highest priority customers first.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Minimize the amount of overhead in the scheduling system itself.
For instance, it&#39;s inefficient for people to be constantly
trying to reload the Web site, waiting in giant lines,
or have to subscribe to some &lt;a href=&quot;https://www.vaccinespotter.org/CA/&quot;&gt;service&lt;/a&gt;
to find out where they can get doses.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Schedule people at convenient times&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One of the things that has complicated the vaccine rollout has been
trying to balance the first and second objectives. It&#39;s relatively
easy to use up all your capacity if you don&#39;t care who you serve first,
but that may mean that you&#39;re only serving rich people or -- especially
important in the case of COVID -- that you&#39;re serving young healthy
people rather than people who are at high risk.
Conversely, you can enforce strict prioritization but if you don&#39;t
have enough people in a given tranche at a given time -- or can&#39;t
reach them -- then you aren&#39;t
going to efficiently use all your capacity. In the worst case, you
won&#39;t give out all of your doses; there were certainly plenty
of reports of this happening in the early days of the US vaccination
effort. Which of these to prioritize is a policy judgement,
but there&#39;s inherently a tradeoff.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;I&#39;m most familiar with the system in California, which looked something
like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Break people up into priority tranches based on -- or at least
intended to be based on -- risk level.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Open up registration for the highest priority closed tranche.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Continue at the current level until current demand subsides
and you start to have open appointments.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Repeat steps 2-3.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This wasn&#39;t terrible but had fairly predictable results: because the
tranches were so large compared to the amount
of vaccination capacity, as soon as registration opened for a new
tranche you would have a period of chaos until demand died down to a
sustainable level. The primary cause of this problem is fragmentation,
both horizontal and vertical. If there was just one place for people
to get vaccinated and all times were equally good, then you could just
take people in the order they registered and have a strict queue and
life would be simple. However, in reality there are multiple
vaccination sites (in Santa Clara County, each with its own
registration process) and multiple time slots, and so instead of just
taking the first slot, people spend time looking through
different location/time possibilities to find one that
works.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; Unfortunately, everyone else is doing the same thing, which
means that by the time you have discovered that that 2 AM appointment
in Lodi is really the best you can get someone else has snapped it
up. In the worst case, people may even find that appointments
have been taken in the time between they are shown the list
and the time they pick one.&lt;/p&gt;
&lt;p&gt;Of course, people respond to this by holding (or even booking) the first
apppointment they can get, figuring they will cancel if they find
something better, which of course just makes the system more unstable,
with slots opening and closing and everyone just getting
frustrated.
Because everyone is repeatedly trying to
book appointments, there is far more load on the system -- and churn
in what apppointments are available -- than if
people just got on, registered, and got off. In some cases,
the site itself can be overloaded: I had the Santa Clara site just
stall on me when I was trying to book one appointment.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;I want to emphasize here that this is just people responding
rationally to the situation they find themselves in; the problem here
is the design of the system itself.  What&#39;s going on here is that
every time you open up registration to a new tranche you create a
temporary state of very high demand, which overloads the registration
system. Because there is finite demand, this will eventually fix
itself, but the situation can be improved by avoiding the surge in the
first place. Consider what happens if instead of allowing everyone
over 50 to book at the same time we randomly selected one eligible
person, let them book (say, up to a week out) and then moved on to the
next person. In this case, each person would fairly quickly
pick the appointment that was best for them without worry that
they would lose a slot.&lt;/p&gt;
&lt;p&gt;While efficient in terms of time spent by customers, this system
is impractically slow: even if people choose relatively rapidly,
it will still take too long to fill all the slots. However,
we can approximate this by letting people register in small batches:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Allow everyone to register with the system for a place in line.
This can be done well before you open up a new tranche, because
you&#39;re just adding them to a list. Some states already did this
via systems like &lt;a href=&quot;https://myturn.ca.gov/&quot;&gt;MyTurn&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Periodically, randomly select people out of the registered
group and offer them the right to actually book an appointment.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Monitor the load on the appointment system and once the
rate of appointment requests starts to decline, go back to
step 2.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This system has a number of nice properties. First, it lets you
continuously adjust the rate at which you admit people into the
booking system: if things get too full and there&#39;s too much churn, you
just wait a little while and then admit fewer people next time. If
things get too slow, you admit some more people. Second, it&#39;s a lot
less frustrating for users because the set of appointments is
reasonably stable; rather than forcing them to take the first
appointment they see whether it&#39;s one they like or not, they can look
at the appointments and take something like the best one for
them. Of course, this all works better if you have centralized scheduling
for a large region and works badly if you have very decentralized
scheduling because it&#39;s hard for the individual sites to know who is currently eligible to
book. In those cases, one can use a simpler algorithm where you
just do a lottery by birthdate. This doesn&#39;t give you as
fine-grained control of the input rate and still requires
some way to announce when each day&#39;s tranche becomes available,
but would work better than the current system.&lt;/p&gt;
&lt;p&gt;I want to note that this is fundamentantally different from
having more fine-grained eligibility criteria, which seems
problematic. It&#39;s already the case that there was a lot of
debate about the specific prioritization and probably
some &lt;a href=&quot;https://www.nytimes.com/2021/03/27/style/covid-vaccine-comorbidities.html&quot;&gt;bad behavior around the edges&lt;/a&gt;, and making the criteria more fine-grained just makes
the situation worse. The idea here
isn&#39;t to try to prioritize people better, it&#39;s just to
meter the flow of people into the system so that we don&#39;t
overload scheduling capacity. With that said, it arguably
is fairer because you&#39;re allocating
appointments randomly rather than on who based on who presses reload on their browser the fastest.&lt;/p&gt;
&lt;p&gt;Of course, we&#39;re still left with the problem of people booking
appointments and then just not showing up. If you book precisely
as many appointments as you have doses available, then some
people won&#39;t show and you&#39;ll have some leftover. This isn&#39;t
that big an issue in a big vaccine site because even the
mRNA vaccines can be stored in the refrigerator for &lt;a href=&quot;https://www.cdc.gov/vaccines/covid-19/info-by-product/pfizer/index.html&quot;&gt;a few days&lt;/a&gt;
and so extra can mostly just be kept around (or alternately,
you can overbook a little but the way that airlines do). For a small site,
it&#39;s important to have some way of offering leftovers to
people who don&#39;t have appointments -- or perhaps aren&#39;t even
eligible -- better to get shots in arms than to have it go to
waste.&lt;/p&gt;
&lt;p&gt;I don&#39;t mean to sound ungrateful here: turning around a new
vaccine in less than a year is nothing short of miraculous
and after a bit of a slow start local officials have done
an amazing job of quickly getting people vaccinated under
very tough conditions. However, once the most serious phase
of the emergency is over it&#39;s always important to ask what
we could do better; this seems like one place for potential
improvement.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;To be clear, the people writing these bots
did us all a service, but it&#39;s unfortunate that they had to do
it. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This precise formulation
isn&#39;t something you face in networking, but a related one is.
Consider what happens if you are trying to use the same
network for real-time conferencing and bulk data transfer:
You need the channel empty at the right times so that you
can send the video and audio frames, otherwise they get
queued and you get jitter. You want to schedule
the data transfer for periods where the channel would be idle. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The cost/benefit calculation here is tricky: obviously it&#39;s
far better for any unvaccinated individual if they get a
given vaccine dose than if someone else does, but it&#39;s
even worse for the vaccine dose to go to waste, because
other people being vaccinated helps you to some extent.
This means that we need to optimze the overall system,
not just ask if the &amp;quot;wrong person&amp;quot; occasionally gets
vaccinated. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There&#39;s actually an &lt;a href=&quot;https://en.wikipedia.org/wiki/Optimal_stopping&quot;&gt;extensive literature&lt;/a&gt;
in how to choose in scenarios like this where you get to
look at each opportunity once and have to either accept
or reject it. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The analogous situation in networking is called &amp;quot;congestion collapse&amp;quot;.
If you have a network link which cannot handle all the
traffic people want to send through it, it responds by
dropping packets. If the endpoints respond by retransmitting,
you can get into a state where everyone is trying to send
aggressively and the link gets clogged with retransmissions,
with the result that the link is &lt;em&gt;full&lt;/em&gt; but not carrying
much useful data. See &lt;a href=&quot;https://ee.lbl.gov/papers/congavoid.pdf&quot;&gt;Van Jacobson and Karels&lt;/a&gt;
for a good description of how this happens and how to avoid it. The
situation is actually quite a bit more complicated in networking
because the endpoints don&#39;t know the total capacity of the network. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
You could of course do first come first served from the
line you established in step 1, but then you create a race
for initial registration. It&#39;s probably better to just
randomly pick or to pick out of all the people who registered
in a given week, etc. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-registration/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Lights for Running</title>
		<link href="https://educatedguesswork.org/posts/running-lights/"/>
		<updated>2021-05-11T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/running-lights/</id>
		<content type="html">&lt;p&gt;*Expanded version of &lt;a href=&quot;https://twitter.com/ekr____/status/1353195655712280577&quot;&gt;twitter thread&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Most serious runners find themselves running in the dark at one time
or another. The most common reason is because you need to squeeze in
a workout before or after work -- especially in the winter -- but
there are plenty of ultradistance events (100 miles, 24 hrs, etc.)
that are likely to have you out overnight. In either case, you&#39;re
going to need some lighting. There are three major options here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hand-held flashlights&lt;/li&gt;
&lt;li&gt;Headlamps&lt;/li&gt;
&lt;li&gt;Waist-mounted light&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a practical matter, headlamps seem to be the dominant choice,
probably because hand-helds are a pain and waist-mounted lights are
kind of a boutique product. I&#39;ve had a number of people ask me what
kind of headlamp to buy, so here&#39;s a brief overview of the space.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-lights/#tradeoffs&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Headlamps have gotten a lot better over the past few years due to
&lt;a href=&quot;https://en.wikipedia.org/wiki/Haitz%27s_law&quot;&gt;improvements in LEDs&lt;/a&gt;
and batteries but the fundamental fact is that making light consumes
power and the brighter the light the more power it consumes. More
power storage means more battery, which means more weight. The
result is that selecting the right headlamp is a compromise between
weight, brightness, and burn time. In general, you want to carry
the lightest headlamp which will burn at the brightness you need
for the time you need.&lt;/p&gt;
&lt;p&gt;There are two main variables you have to consider:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The terrain you are going to be running on.&lt;/li&gt;
&lt;li&gt;The amount of time you will be running in the dark.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Brightness is &lt;em&gt;mostly&lt;/em&gt; determined by terrain If you&#39;re going to be
running on the road or really smooth trail, I generally find 100-200
lumens or so to be enough. On technical trail, you can get away with
200 lumens, but I prefer 400 or so. 1000 lumens is better, but you&#39;re
going to pay for it in weight and also will start blinding anyone
coming the opposite way.&lt;/p&gt;
&lt;p&gt;There are three main scenarios in terms of burn time:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Running in the morning before sunrise&lt;/li&gt;
&lt;li&gt;Running in the evening after sunset&lt;/li&gt;
&lt;li&gt;Overnight runs (this is most common in ultramarathons over 100K)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the morning, it&#39;s pretty easy to figure out how much burn time you
need: just subtract the time you start from the time when sunrise
happens, plus perhaps a little slack. The good news here is that if
you overestimate and run out of battery, it&#39;s not usually that big a
deal: it&#39;s probably going to happen towards dawn anyway, and in the
worst case you can just walk or stand around till it gets bright
enough to see. The evening is a little different because there&#39;s
uncertainty about how long you&#39;ll be out. If you underestimate how
long you&#39;ll be out -- for instance if you fall and have to walk it in
-- or overestimate your burn time then you can get stuck out in the
dark, which is not good. For this reason, you&#39;ll want a fair amount
more slack or to carry a backup light (see below).&lt;/p&gt;
&lt;p&gt;Overnight runs are conceptually like running in the morning: you
know when sunrise is and you need enough burn time to last you
the whole night. This likely means you&#39;ll need some
sort of spare batteries. For instance, the Lupine 3.5Ah/25Wh
&lt;a href=&quot;https://www.lupinenorthamerica.com/item.asp?cID=0&amp;amp;scID=56&amp;amp;PID=592&quot;&gt;battery&lt;/a&gt;
weighs 120g, which, combined with the lamp head, is about as much weight as you want to have
on your head at once, and gives you 650 lumens at 6 watts and
350 lumens at 3 watts. So you could just barely make it all
night on 350, which is a little dim, or you&#39;ll need two batteries.
To be honest, that&#39;s probably better anyway: changing batteries
is fast and you&#39;ll be happier with weight in your pack than with it
on your head.
One thing I want to note here is that for an overnight run it&#39;s not
just the ability to see the trail but also keeping yourself
awake: in the middle of the night your body is going to want
to sleep and having a really bright light seems to help me
fight that.&lt;/p&gt;
&lt;h2 id=&quot;battery-type&quot;&gt;Battery Type &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-lights/#battery-type&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There are two main battery choices: disposable batteries (lithium are
the best here because they&#39;re the lightest) or rechargeables.  For
short duration runs, I definitely recommend rechargeable. That way you
can burn down part of the battery and then charge it back to 100%. If
you&#39;re using disposable, then there&#39;s a good chance you use 60% or so,
and then you might not have enough for your next run, plus you have a
lot of uncertainty about how much light you have left. For longer runs where you&#39;re going to run through
your entire battery pack anyway, it may be cheaper to just buy
disposables rather than having multiple rechargeable battery
packs.&lt;/p&gt;
&lt;p&gt;Historically, you mostly had to choose when you bought the light
because it either came with a battery pack or it took separate
batteries (typically AAAs). A number of headlamps, such as the Petzl
&lt;a href=&quot;https://www.petzl.com/US/en/Sport/ACTIVE-headlamps/ACTIK-CORE&quot;&gt;Actik
Core&lt;/a&gt;
now come with hybrid systems where they have a rechargeable battery
pack but will also take regular batteries. This has the advantage that
you can use the rechargeable pack for daily usage but carry disposables
as backup or if you have a longer event. If you&#39;re going to be going
overnight, this seems like a good option. You might also be able
to use a headlamp that takes regular AAAs and put in rechargeable
AAAs, but the one time I tried it, they seemed to be slightly larger
and I had trouble getting them to fit, so you&#39;d want to test this
out for yourself.&lt;/p&gt;
&lt;p&gt;You&#39;ll notice I haven&#39;t talked about lights failing. This can happen,
but modern LED headlamps are very reliable, so this is less of a concern
than it might otherwise be.&lt;/p&gt;
&lt;h2 id=&quot;battery-placement&quot;&gt;Battery Placement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-lights/#battery-placement&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Because a significant part of the weight of a headlamp is battery
-- especially if you want a lot of burn time -- the location of
the battery affects the balance of the headlamp. There are two
major options here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Directly in the light in a single unit on the front of your
head.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In a separate pod on the back of your head.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The front of your head is a good option for relatively dim lamps
or those with short burn time, but if you want a bright lamp
and long burn time, you&#39;re likely to end up with a pod on the back
of your head. Otherwise the lamp gets too heavy and tends to
press against your forehead uncomfortable -- at least it does
for me. You can also get battery packs with longer cords which you
can carry in your pack or maybe on your waist, but this seems
like just another thing to mess around with. For instance, it
means that if you want to take your pack off for some reason
now you have to take the batteries out of the pack, which I expect will
be a pain; I already have experience with this with wired
headphones connected to a phone in my pack, so I&#39;m not eager
to add another cable attaching my head to my pack.&lt;/p&gt;
&lt;h2 id=&quot;beam-shape%2C-etc.&quot;&gt;Beam Shape, etc. &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-lights/#beam-shape%2C-etc.&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The total light output in lumens is really important, but it&#39;s
not the only thing. Beam shape also matters: I
like a fairly bright center spot but based on the lights
I&#39;ve seen, others seem to like a more broad diffuse beam.
Another question is whether you want a constant beam or
one that reacts to environmental conditions, for instance
getting bright when you are looking further away. Petzl
offers a number of lights like this, but I personally
prefer a constant beam and find it annoying to keep having
the amount of light changing depending on where I look;
that happens enough just from your head moving around.&lt;/p&gt;
&lt;h2 id=&quot;remote-configuration&quot;&gt;Remote Configuration &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-lights/#remote-configuration&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A number of the higher-end lights can be remotely configured
with a mobile app. For instance, you can set the number of
brightness levels and how bright they are. This seems
somewhat useful in principle, though to be honest I&#39;ve
not been that impressed. When I initially got a headlamp
with this feature I fiddled with it a bit but at the
end of the day the factory settings are probably fine
unless you&#39;re the kind of person who really likes to tune
everything.&lt;/p&gt;
&lt;h2 id=&quot;recommendations&quot;&gt;Recommendations &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/running-lights/#recommendations&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Ultimately, this is all a matter of personal preference: If you&#39;re
going to be wearing a headlamp regularly, I would advise trying
several headlamps to see what you like in terms of comfort, beam
shape, etc. The headstraps in particular have a lot of impact
on comfort, but everyone&#39;s head is different.
With that said, I do have some recommendations/thoughts.&lt;/p&gt;
&lt;p&gt;My standard headlamp is a &lt;a href=&quot;https://www.petzl.com/US/en/Sport/ACTIVE-headlamps/ACTIK-CORE&quot;&gt;Petzl Actik Core&lt;/a&gt;.
It&#39;s 450 lumen at brightest and is rated for 2hrs at that level. As
noted before, it comes with a rechargeable battery but will
also take separate batteries. I&#39;ve used it regularly for
pre-dawn runs and would also feel comfortable with it for
the beginning and end of a 50 mile or 100K race where you started
before it was light and might finish in the dark. I used this light for
Rim-to-Rim-to-Rim and they
were plenty bright to use for the South Kaibab descent. I&#39;m
less certain I would want this light for an overnight
event: 450 lumens is on the bottom end of brightness for
that. One thing I don&#39;t love about this headlamp is that
it&#39;s hard to adjust the brightness: pressing the button
just cycles you through brightness levels, including off, so
that makes it a bit of a pain to dim the light: once it&#39;s on max,
the next click takes you to off and now you&#39;re in the dark.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/running-lights/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;I also have a &lt;a href=&quot;https://www.lupinenorthamerica.com/Piko_X4_1900lm_LED_Headlamp.asp&quot;&gt;Lupine Piko&lt;/a&gt;
(the somewhat older 1500 lumen version). It&#39;s reasonably light
(the new version is 180g with the 3.5 Ah battery) and ridiculously
bright (the new version is 1900 lumens); basically you&#39;re
wearing a car headlamp on your face. As a practical matter
you almost never need a light this bright for running,
though it&#39;s nice to have the option of a really bright beam
for technical sections or when you&#39;re tired.. It&#39;s fantastic for cycling,
though, and you can get a helmet mount for the Lupine head
unit. I&#39;ve worn this for a number of overnight events and
it&#39;s very comfortable and you don&#39;t really notice it on your
head. With that said, the Lupines are quite expensive and I
bought this a while ago, so while it&#39;s served me well I
don&#39;t know if it&#39;s the best choice today.&lt;/p&gt;
&lt;p&gt;If you are going to run on the road, you can probably
get by just fine with an ultralight headlamp around a few hundred
lumens. I&#39;ve heard good things about the &lt;a href=&quot;https://www.petzl.com/US/en/Sport/ACTIVE-headlamps/BINDI&quot;&gt;Petzl Bindi&lt;/a&gt;
which gives you 2 hours at 200 lumens in a ridiculous 35g package.
This kind of light is also useful as a backup light to carry
in your running pack. I strongly recommend doing this if
you&#39;re going to be doing any long distance stuff: your
regular light might fail in some way or you might run into
someone whose has. In at least two events I&#39;ve run into someone
who needed a light and had to lend them one. I usually carry
a &lt;a href=&quot;https://www.petzl.com/US/en/Sport/CLASSIC-headlamps/ePLUSLITE&quot;&gt;Petzl e+Lite&lt;/a&gt;
for this purpose. It works OK but maxes out at 26g; today I&#39;d
probably carry a Bindi, which is only fractionally heavier.&lt;/p&gt;
&lt;p&gt;Like I say, this is all kind of personal, so what
works for me may not work for you. The good news is
that REI, Backcountry, etc. sell most of the major brands
like Petzl and Black Diamond so you can try out different
units for yourself and return/exchange if you don&#39;t like
them.&lt;/p&gt;
&lt;!-- % size of rechargeables--&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Incidentally,
it&#39;s easy to get fooled by your light about how much
ambient light there is. On more then one run I&#39;ve thought
&amp;quot;hey, I don&#39;t need this light&amp;quot; and then covered it and
realized, &amp;quot;nope, it&#39;s still dark&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/running-lights/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>The (depressing) future of stalking tech</title>
		<link href="https://educatedguesswork.org/posts/depressing-future-stalking/"/>
		<updated>2021-05-09T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/depressing-future-stalking/</id>
		<content type="html">&lt;p&gt;Earlier, I &lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/&quot;&gt;wrote
about&lt;/a&gt; concerns
about the privacy properties of personal trackers like the Apple
AirTag. These are legitimate concerns, but it&#39;s important to recognize
that they appear against the background of the current technological
landscape, a landscape that is changing rapidly. Until relatively
recently, if you wanted to track someone&#39;s movements you pretty much
had to follow them around. This is practical in some circumstances but
doesn&#39;t really scale well, and made it kind of a full-time job.
But technology has changed that.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s important to recognize that we&#39;re already long past the point
where governments can easily track you, especially now that most
everyone is carrying a &lt;a href=&quot;https://www.apple.com/iphone/&quot;&gt;tracking&lt;/a&gt;
&lt;a href=&quot;https://store.google.com/us/product/pixel_5?hl=en-US&quot;&gt;device&lt;/a&gt; in their pocket.  Even without
that, governments have widely deployed &lt;a href=&quot;https://en.wikipedia.org/wiki/Traffic_and_Environmental_Zone&quot;&gt;surveillance
cameras&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/wiki/Automatic_number-plate_recognition&quot;&gt;automatic number plate recognition
cameras&lt;/a&gt;,
etc. If the government wants to spy on you, they have a lot of options
that are mostly constrained by legal restrictions, not technical ones.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;For individuals who cannot subpoena cellphone records and the
like, the situation is different. Without access to the preinstalled
surveillance infrastructure of the state and the big telcos,
the attacker has two main options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Subvert the victim&#39;s existing tech (e.g., install spyware
on their devices)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Plant your own tracking tech on the victim&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The second of these is the concern with AirTags and other personal
trackers, namely that the attacker will plant an AirTag on the victim
and use it to follow them around. Much of the concern around the
design of these devices centers around whether they have been
built with strong enough countermeasures to prevent nonconsensual
tracking. There are real questions here, but I suspect that they
will be obsolete before long.&lt;/p&gt;
&lt;p&gt;We should start by asking why systems like Tile and AirTags are
implemented the way they are. In particular, why do they depend on
other people&#39;s devices to localize your tracker and relay its position
back to you? Why don&#39;t they just have a GPS and use the mobile phone
network to report your position? This would have a number of
advantages, including that the system would work from the very
beginning, rather than depending on a critical mass of installed
devices to help locate your device.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; One big reason is technical
limitations, namely price, size, and battery: an AirTag costs less
than $30, weighs &lt;a href=&quot;https://www.apple.com/airtag/&quot;&gt;11
grams&lt;/a&gt;, is powered by a CR2032 battery,
and has a battery lifetime of over a year. By contrast, my Garmin GPS
watch is expensive, weighs over 90g and needs to be charged every week or two,
and I can barely get through a day without recharging my iPhone. So,
BlueTooth-type trackers have real advantages.&lt;/p&gt;
&lt;p&gt;Here&#39;s the problem: the countermeasures that people are talking about
to prevent stalking via this kind of personal tracker mostly &lt;em&gt;depend&lt;/em&gt;
on them being built in this particular way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Detecting if a tracker has been separated from its paired device
and alerting assumes there is a paired device.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Detecting if a tracker has been moving with you depends on it
transmitting some readily detectable signal (e.g. BlueTooth).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But there&#39;s no reason things have to be this way. Although it&#39;s
certainly convenient to build tracking tags as BlueTooth
transponders using phones as a relay network, there are already
two categories of somewhat widely-deployed tracking devices
that don&#39;t work this way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Satellite communicators like the &lt;a href=&quot;https://buy.garmin.com/en-US/US/p/592606#specs&quot;&gt;Garmin inReach Mini&lt;/a&gt;
or the &lt;a href=&quot;https://www.findmespot.com/en-us/products-services/spot-trace&quot;&gt;SPOT Trace&lt;/a&gt;,
which use GPS for location and transmit your location through the &lt;a href=&quot;https://en.wikipedia.org/wiki/Iridium_satellite_constellation&quot;&gt;iridium&lt;/a&gt; satellite network.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pet trackers like the &lt;a href=&quot;https://tryfi.com/&quot;&gt;Fi dog collar&lt;/a&gt; which use
GPS (and I expect WiFi and cell tower location) and WiFi or
cellular for communication.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither of these solutions are that attractive for finding your keys
(Fi costs $199/each and weighs about &lt;a href=&quot;https://www.pcmag.com/reviews/fi-smart-dog-collar&quot;&gt;4x as much as an
AirTag&lt;/a&gt;), but it&#39;s
not out of the question that someone would use it for tracking
purposes: though nowhere near as small as an AirTag or Tile
these devices are small enough to conceal in a bag&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
have a battery lifetime of a few
weeks depending on exactly how they are used.
Note that some of these devices are managed via Bluetooth, so
should be detectable by the &amp;quot;devices following me&amp;quot; type mechanisms
that Apple uses for preventing tracking with AirTags&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;, but they don&#39;t
have to be and some
of them, like the SPOT Trace, do not appear to have any BlueTooth functionality.&lt;/p&gt;
&lt;p&gt;I&#39;m not saying that existing tracking devices are ideal
for stalking out of the box, although it&#39;s certainly possible
that some of them can be used that way or can be easily reprogrammed for it.
Rather, it&#39;s that it&#39;s already technically possible
to build a tracking device which is self-contained and doesn&#39;t
require BlueTooth or relaying through other people&#39;s phones.
Even if there is no such device presently on the market, it&#39;s
likely that someone will eventually build one, either for
some other purpose or intentionally for surveillance.&lt;/p&gt;
&lt;p&gt;Which brings us to the final point I want to make, which is
that technology in this space is advancing rapidly and there
is pressure to make things lighter (backpackers always want things
lighter, and don&#39;t you want a GPS collar for your cat?) as well
as to make battery lifetime better. This has several implications:
First, these devices are likely to become smaller and have
longer battery lifetime and thus be easier to conceal and less
of a pain to use. Second, as the state of the art advances
it will become more practical and cheaper for people to make dedicated
surveillance tech versions of this type of device; even if the
normal consumer devices are easy to detect, such as by BlueTooth ID,
those surveillance models won&#39;t be.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;I know this is a bummer. Regrettably, technology does not always
improve things and functionality which can be extremely useful
in some contexts (finding your dog, rescuing you in the backcountry)
can be extremely undesirable in others.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There is a general observation here, which is that a lot of what
technology has done in this area is to take surveillance which used to be
in principle  possible but in practice prohibitively expensive
and make it extremely practical. Justice Alito&#39;s concurrence
in &lt;a href=&quot;https://www.law.cornell.edu/supct/pdf/10-1259.pdf&quot;&gt;US v. Jones&lt;/a&gt;
does a good job of covering this. In US v. Jones, the
government attached a GPS tracker to the suspect&#39;s car.
Alito writes &amp;quot;In the pre-computer age, the greatest protections of
privacy were neither constitutional nor statutory, but
practical. Traditional surveillance for any extended period of time
was difficult and costly and therefore rarely undertaken. The
surveillance at issue in this case—constant monitoring of the
location of a vehicle for four weeks— would have required a large
team of agents, multiple vehicles, and perhaps aerial assistance.
Only an investigation of unusual importance could have justified
such an expenditure of law enforcement resources. Devices like the
one used in the present case, however, make long-term monitoring
relatively easy and cheap.&amp;quot; The concurrence also comes complete
with a hypothetical in which &amp;quot;a constable secreted himself somewhere
in a coach and remained there for a period of time in order to
monitor the movements of the coach’s owner&amp;quot;, about which
Alito says &amp;quot;The Court suggests that something like this might have
occurred in 1791, but this would have required either a gigantic
coach, a very tiny constable, or both—not to mention a constable with
incredible fortitude and patience.&amp;quot; &lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
This isn&#39;t to say there is nothing you can do, but it&#39;s kind of a pain.
Check out this great &lt;a href=&quot;https://twitter.com/aclu/status/1383481509290467333?s=21&quot;&gt;video&lt;/a&gt;
by &lt;a href=&quot;https://www.aclu.org/news/by/daniel-kahn-gillmor/&quot;&gt;DKG&lt;/a&gt; on how
to protect yourself. &lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;As I noted previously, this
is a huge advantage for Apple, in that they don&#39;t need to persuade
users to install their app. &lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
I&#39;ve lost devices this size in my bag plenty of times. &lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Note that the BlueTooth features which enable this functionality seem
undesirable for other privacy reasons. &lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For that matter, I wouldn&#39;t
be surprised to learn that they already exist. &lt;a href=&quot;https://educatedguesswork.org/posts/depressing-future-stalking/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Thoughts on personal tracker privacy</title>
		<link href="https://educatedguesswork.org/posts/airtag-privacy/"/>
		<updated>2021-05-08T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/airtag-privacy/</id>
		<content type="html">&lt;p&gt;The privacy implications of Apple&#39;s new AirTag tracking system are
getting some &lt;a href=&quot;https://www.washingtonpost.com/technology/2021/05/05/apple-airtags-stalking/&quot;&gt;negative
attention&lt;/a&gt;
right now. Briefly, AirTags are little battery powered BlueTooth (among other
wireless protocols) transponders
which you attach to/put in items you own (e.g., your keys). You pair them
with your phone and can then use your phone to find the tags and
whatever you attached them to.
Obviously, these protocols are short range, and you might have lost your
item somewhere else, so AirTags include a feature where other iPhones
will report the location of your AirTag via Apple, allowing you to
locate it even when it&#39;s out of range.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There&#39;s nothing fundamentally new here: a number of companies such as
&lt;a href=&quot;https://www.thetrackr.com/&quot;&gt;TrackR&lt;/a&gt; and
&lt;a href=&quot;https://www.thetileapp.com/?ref=tiledotcom&amp;amp;utm_source=tiledotcom&quot;&gt;Tile&lt;/a&gt;
already make this kind of device. The primary difference here -- aside
from the usual slick Apple engineering -- is the large size of the
potential network of devices that can report an AirTag&#39;s position
(It&#39;s a little unclear exactly which Apple devices are involved here,
but Apple &lt;a href=&quot;https://www.apple.com/airtag/&quot;&gt;says&lt;/a&gt; &amp;quot;the Find My network —
hundreds of millions of iPhone, iPad, and Mac devices around the
world&amp;quot;, which probably means it&#39;s every device with the Find My
feature turned on. A bigger network is better at tracking, and there
are a lot of Apple devices, so it seems likely that AirTags will work
pretty well.&lt;/p&gt;
&lt;h2 id=&quot;track-more-than-your-own-stuff&quot;&gt;Track more than your own stuff &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#track-more-than-your-own-stuff&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So, what&#39;s the problem? Well, any system like this can be used not
only to track &lt;em&gt;your&lt;/em&gt; stuff but also to track &lt;em&gt;other people&#39;s&lt;/em&gt; stuff,
and transitively, other people. All I have to do is buy a tracker,
pair it to my phone and then stuff it in your bag and I can use it
to track you. This is obviously not ideal, and as WaPo observes,
can be used by stalkers:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Clip a button-sized AirTag onto your keys, and it’ll help you find
where you accidentally dropped them in the park. But if someone else
slips an AirTag into your bag or car without your knowledge, it could
also be used to covertly track everywhere you go. Along with helping
you find lost items, AirTags are a new means of inexpensive, effective
stalking.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Apple has built in some &lt;a href=&quot;https://support.apple.com/en-us/HT212227&quot;&gt;countermeasures&lt;/a&gt;
for this form of attack. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If AirTags are away from their owners for &amp;quot;an extended period of time&amp;quot; they
make a sound when moved.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your iOS device detects that an AirTag that doesn&#39;t belong to you
moving with you, it will notify you on the device and then you can
try to find it and figure out what&#39;s going on.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As WaPo points out, this is an imperfect defense: you might not notice
the AirTag playing a sound and it&#39;s possible for someone who controls
your phone temporarily to disable the feature which detects the AirTag
moving with you. And of course, that feature won&#39;t protect you at all
if you have an Android phone. &lt;strong&gt;Note&lt;/strong&gt;: WaPo also says: &amp;quot;Apple has done
more to combat stalking than small tracking-device competitors like
Tile, which so far has done nothing.&amp;quot;&lt;/p&gt;
&lt;p&gt;So the good news is that if you have an iOS device -- and a lot of people
do -- and nobody
has tampered with it, then you&#39;ll have some measure of protection
by default. On the other hand, if you don&#39;t have an iOS device -- and of course many people don&#39;t --
the situation is more complicated. You won&#39;t have any protection by
default and you may not be able to do much of anything
to protect yourself. Presumably Apple
could build an Android app that would do whatever it is that iOS devices
do now, but they don&#39;t seem to have done so. It might or might not
be possible for someone else to do so, depending on exactly how this
function works.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
There are a number of Android apps that appear to let you look at BlueTooth
or NFC devices in your local area, but it&#39;s not clear how easy they
are for ordinary people to use for thus purpose; the same identifier
changing techniques which make it hard to track tags trivially
may also make it hard to use this kind of program to detect tracking.&lt;/p&gt;
&lt;h2 id=&quot;is-it-possible-to-do-better%3F&quot;&gt;Is it possible to do better? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#is-it-possible-to-do-better%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Clearly, the privacy properties of this kind of tracker aren&#39;t ideal.
This raises the question of whether it&#39;s possible to do better.
Specifically: can we significantly improve the privacy properties of
this kind of system without also significantly reducing its usefulness
for legitimate applications? If we can do so, then that&#39;s good. If
not, then there are some hard tradeoffs. In addition to a few
ergonomic-type tweaks suggested in the WaPo article (scan your local
network for trackers, tune the &amp;quot;moves with&amp;quot; you algorithm to work if
there is a tracker in your car, etc.)  it seems like there are a few
small things that one could do:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Make the notification when the device is away from the owner
more apparent (louder, etc.). In general, it&#39;s just not obvious
how useful this whole feature is, though. In a domestic abuse situation,
the tracker is likely to be in the presence of the abuser
pretty regularly, so it&#39;s not clear whether this would really
work (this is a point WaPo makes).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Have a standardized mechanism for detecting that a device is
&amp;quot;moving with you&amp;quot; that is implemented by every major
tracker type and every device manufacturer. Ideally, devices
would just do this by default, so that users didn&#39;t have
to take any positive action.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note: These aren&#39;t original to me; they&#39;re implied or
outright suggested in the WaPo article.&lt;/p&gt;
&lt;p&gt;These would improve the situation somewhat
though I could imagine the &amp;quot;I&#39;ve been separated
from my owner&amp;quot; feature getting pretty annoying: consider what
happens if your spouse goes out of town for a few days and then
suddenly you have to listen to all their tagged devices angrily
beeping like a smoke alarm that has run out of battery until
you can find them and shut them off.&lt;/p&gt;
&lt;p&gt;Another potential improvement would be to separate the functionality
of being in the Apple &amp;quot;Find My&amp;quot; network for the purposes of having your
devices found by you from the functionality of reporting back about trackers
it sees. This would prevent &lt;em&gt;your&lt;/em&gt; device from reporting the
position of a tracker that is tracking you, but not other devices that
don&#39;t belong to you from doing that. However, given the large number of
devices that are going to be doing that reporting, it seems likely
that that tracker will still be trackable; after all, that is the whole
premise of the ordinary use of the system. For this reason, I don&#39;t
think that this change would significantly improve things.&lt;/p&gt;
&lt;p&gt;We could substantially improve the privacy of these systems by removing
the ability to track trackers in real-time. Right now, you can usually
just show the current position of a tracker on a map, which obviously
makes tracking someone easier. If instead you could only interrogate
the status of a particular tracker at a given time and then that tracker
somehow indicated it was being tracked (e.g., it made a loud sound
or alerted every device in the area) that would make surreptitious
tracking much more difficult, but would obviously make the system
rather less useful.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#summary&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the end of the day, this kind of tracker is a dual-use technology:
It can be used both for legitimate ends (finding your stuff) and
for illegitimate ends (tracking other people). While there are some things
one could do to deter illegitimate use -- and Apple has done some of
these -- It&#39;s not clear how much one can really do technically to make
it hard to use for illegitimate purposes without also making it less
useful for legitimate users as well.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I&#39;m actually really curious
how this works. Apple says &amp;quot;AirTag was designed with privacy at its
core. AirTag has unique Bluetooth identifiers that change
frequently. This helps prevent you from being tracked from place to
place. When the Find My network is used to locate an offline device
or AirTag, everyone’s information is protected with end-to-end
encryption. No one, including Apple, knows the location or identity
of any of the participating users or devices who help locate a
missing AirTag.&amp;quot; Matt Green has some &lt;a href=&quot;https://blog.cryptographyengineering.com/2019/06/05/how-does-apple-privately-find-your-offline-devices/&quot;&gt;ideas&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;As noted above, Apple says that the BlueTooth identifiers
change frequently, so that makes one approach difficult. Perhaps they
use NFC UIDs which &lt;a href=&quot;https://help.gototags.com/article/nfc-uid/&quot;&gt;apparently cannot be changed&lt;/a&gt;? &lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Standardization
might also come with some drawbacks. Consider the case
where a countermeasure involves the tracker doing something,
like alerting the user or sending out some other signal;
with a standardized protocol, you could make and sell
trackers which followed the standard enough to be tracked
but didn&#39;t do the alerting piece. &lt;a href=&quot;https://educatedguesswork.org/posts/airtag-privacy/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Authentication for Vaccine Passports</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport-pki/"/>
		<updated>2021-05-02T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport-pki/</id>
		<content type="html">&lt;p&gt;Via &lt;a href=&quot;https://ben.adida.net/&quot;&gt;Ben Adida&lt;/a&gt; I learned about the
&lt;a href=&quot;https://vci.org/&quot;&gt;Vaccine Credentials Initiative (VCI)&lt;/a&gt;.
I&#39;m pleased to see that they provide a fairly complete set of
&lt;a href=&quot;https://smarthealth.cards/&quot;&gt;specification&lt;/a&gt; for their credential.
&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/&quot;&gt;last week&lt;/a&gt;.
At a high level, it&#39;s a digitally signed credential using conventional
cryptograpy, (&lt;a href=&quot;https://tools.ietf.org/html/rfc7515&quot;&gt;JSON Web Signatures&lt;/a&gt;,
signed with ECDSA and P-256), and encoded into a QR code.
This allows it to be printed on paper and just carried around without
some kind of smart phone app.
This seems like a pretty sensible design
and roughly matches the kind of system I argued was reasonable.&lt;/p&gt;
&lt;p&gt;However, as I was reading the specification, I started to think about
the key management problem, which seems kind of unsolved: we&#39;re going
to have a lot of different groups/organizations giving shots to
different people and we need somehow to turn that information into
credentials that people can use.  Moroever, we need those credentials
to be interoperable: any valid credential should be accepted by any
verifier. It would be very undesirable if you got vaccinated at CVS and
I got vaccinated at Walgreens and then when we went to get on
a plane, only my credential was accepted and you were stuck in the
airport.&lt;/p&gt;
&lt;h2 id=&quot;a-vaccine-credential-pki&quot;&gt;A Vaccine Credential PKI &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#a-vaccine-credential-pki&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There&#39;s a rough parallel here to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Public_key_infrastructure&quot;&gt;public key infrastructure&lt;/a&gt;
which powers secure transactions at the Web, and we can take
a generaly similar approach: The verifier&#39;s software trusts some set of
entities (e.g., the CDC, state governments) which either directly
issue credentials or authorize other entities (e.g., Walgreens)
to issue credentials. The top-level entities are conventionally
called &lt;em&gt;trust anchors&lt;/em&gt; (TAs) As an operational matter, of course,
the TAs will not by giving every shot, so what happens if I instead
get my shot at Walgreens. There are two basic ways to handle
this situation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The TA issues the credential but operates some service which
allows the clinic to request a credential for an individual
patient (in WebPKI terms, this would be called a &lt;em&gt;registration
authority&lt;/em&gt;). This has the advantage of simplicity but the
disadvantage that the TA needs to be involved in every transaction.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Rather than issuing the credential directly, the TA delegates the
right to issue credentials to the clinic and then the clinic
just issues the credentials directly. In the WebPKI world, this
is done by giving the clinic an &amp;quot;intermediate certificate&amp;quot;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these models has advantages and it&#39;s likely we&#39;ll see a mix.
For instance, the state might issue credentials issued at its own
clinics but delegate signing to Walgreens. And Walgreens might
issue all its credentials centrally but have each store act as
a registration authority, contacting the Walgreens central server
to issue. What we want is a flexible model that has the following
properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A common set of trust anchors that everyone agrees on.&lt;/li&gt;
&lt;li&gt;A method to delegate the right to issue credentials to other entities.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With these basic pieces, we don&#39;t need to be too prescriptive about
the overall structure because different organizations can figure out
the right structure for themselves.&lt;/p&gt;
&lt;h2 id=&quot;aside%3A-manufacturer-trust-anchors&quot;&gt;Aside: Manufacturer Trust Anchors &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#aside%3A-manufacturer-trust-anchors&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One sort of interesting variation I thought of while writing this was
to have the vaccine &lt;em&gt;manufacturers&lt;/em&gt; serve as the trust anchors. For
instance, each box of vaccine could have a code printed on it and then
the clinic could scan the code when they injected someone and then
phone home to the manufacturer to get an issued credential.  [Yes,
this has bad privacy properties but you can fix those with
&lt;a href=&quot;https://en.wikipedia.org/wiki/Blind_signature&quot;&gt;blind signatures&lt;/a&gt;.]  This
won&#39;t work offline, which isn&#39;t ideal, though probably not that big
a deal in a lot of places. Alternately, you could have
each box of vaccine have a code with a private key + certificate pair
and then you could scan the code and use that to sign, which would
work offline.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
The advantage of a system like this is that it wouldn&#39;t
require the issuers to have any real contact with the PKI; they
could just have simple stateless software which scanned the box.
Obviously it&#39;s too late for this given that we&#39;ve vaccinated a lot
of people and it doesn&#39;t work well retroactively, but an interesting
thought experiment nevertheless.&lt;/p&gt;
&lt;h2 id=&quot;the-vci-approach&quot;&gt;The VCI Approach &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#the-vci-approach&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;VCI seems to have settled on some of the pieces of key management.
The way that their key management works is that
each issuer publishes the public keys that they use for signing at a
well-known URL (&lt;code&gt;/.well-known/jwks.json&lt;/code&gt;) and each token contains
the issuer&#39;s URL in the &lt;code&gt;iss&lt;/code&gt; field of the encoded JWS. When
the verifier processes a credential it retrieves the
keys from the issuer&#39;s site and uses them to verify the signature
on the JWS. I.e.:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/img/3cedb6c358a308fd4c354f7488ecb614.png&quot; alt=&quot;VCI flow&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This takes care of the delegation piece (sort of) but not of the trust
anchor piece. Here&#39;s what they say about that:&lt;/p&gt;
&lt;hr /&gt;
&lt;ul&gt;
&lt;li&gt;We&#39;ll work with a willing set of issuers and define expectations/requirements&lt;/li&gt;
&lt;li&gt;Verifiers will learn the list of participating issuers out of
band; each issuer will be associated with a public URL&lt;/li&gt;
&lt;li&gt;Verifiers will discover public keys associated with an issuer via &lt;code&gt;/.well-known/jwks.json&lt;/code&gt; URLs&lt;/li&gt;
&lt;li&gt;For transparency, we&#39;ll publish a list of participating organizations in a public directory&lt;/li&gt;
&lt;li&gt;In a post-pilot deployment, a network of participants would define and agree to a formal Trust Framework&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;p&gt;IOW, in the initial design the trust anchors will be a set of domain names which is then
mapped to the keying material via HTTPS (and thus anchored in the WebPKI).
In the post-pilot period, it seems like they are contemplating a separate
PKI (it&#39;s a little unclear if the keys will still be on the Web in this
case). This seems generally sensible, although there needs to be some way
of ensuring that everyone has the same list of issuers. In the WebPKI
this is accomplished&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; by having software vendors remotely update the list of trust
anchors, so you&#39;d need something like that here.&lt;/p&gt;
&lt;p&gt;One difficulty with the &amp;quot;retrieve the keys from a well-known URL&amp;quot;
approach is that in order to have reliable results it requires that the verifier be online
in order to get the issuer&#39;s public key. Even if the issuer preloads the
public key list by contacting every issuer, the issuer is free to add
a new key, which will cause failures until the issuer re-contacts them.
By contrast, in a system like the WebPKI, the delegations are all self-contained
(in certificates) so you can verify the issuer&#39;s key without contacting
them.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; This means that if the
verifier is offline it may not be able to verify some credentials, which is
suboptimal. Depending exactly on how this is all implemented, this check
might also be some kind of tracking vector, though the design in
which the list of issuers is preconfigured seems to mostly mitigate that.
Anyway, this is a generally reasonable kind of design though of course
the details need to be worked out.&lt;/p&gt;
&lt;h2 id=&quot;interoperability-again&quot;&gt;Interoperability Again &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#interoperability-again&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I said at the beginning,
it&#39;s very important to have a system which is interoperable. This means
we need a single solid specification that is flexible enough to work
for most if not all any applications. Obviously, details do matter,
but there are a lot of ways of doing this, and it&#39;s more important
to pick one. To that end, I&#39;m glad to see organizations like VCI
publishing open specifications which can serve as input to
future standards&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note
that in principle the TA could duplicate their signing key and
give the clinic a copy so they could issue credentials directly,
but this is brittle for a variety of reasons. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Of course, now you have to worry about those
keys leaking, which you don&#39;t with a more centralized system
as you can just count the number of doses assigned to a given box. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Well, &lt;a href=&quot;https://letsencrypt.org/2020/11/06/own-two-feet.html#if-you-use-an-older-version-of-android&quot;&gt;mostly&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;WebPKI relying parties may still contact the issuer to
&lt;a href=&quot;https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol&quot;&gt;check whether the credential is revoked&lt;/a&gt;.
This doesn&#39;t have awesome reliability, security, or privacy properties
and browsers are &lt;a href=&quot;https://dev.chromium.org/Home/chromium-security/crlsets&quot;&gt;gradually&lt;/a&gt;
&lt;a href=&quot;https://developer.apple.com/videos/play/wwdc2017/701/&quot;&gt;deprecating&lt;/a&gt;
&lt;a href=&quot;https://blog.mozilla.org/security/2020/01/21/crlite-part-3-speeding-up-secure-browsing/&quot;&gt;it&lt;/a&gt;.
It seems unlikely there will be much revocation in this system and
so a central revocation list is probably enough. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport-pki/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Notes on Implementing Vaccine Passports</title>
		<link href="https://educatedguesswork.org/posts/vaccine-passport/"/>
		<updated>2021-04-22T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/vaccine-passport/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;&lt;a href=&quot;https://blog.mozilla.org/blog/2021/04/22/notes-on-implementing-vaccine-passports/&quot;&gt;Cross-posted&lt;/a&gt; to the Mozilla blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Now that we&#39;re starting to get widespread COVID vaccination
&amp;quot;vaccine passports&amp;quot; have started to become more relevant.
The idea behind a vaccine passport is that you would have
some kind of credential that you could use to prove that
you had been vaccinated against COVID; various entities
(airlines, clubs, employers, etc.) might require such a
passport as proof of vaccination. Right now deployment
of this kind of mechanism is fairly limited: Israel has
one called the &lt;a href=&quot;https://www.gov.il/en/Departments/General/corona-certificates&quot;&gt;green pass&lt;/a&gt;
and the State of New York is using something called the
&lt;a href=&quot;https://epass.ny.gov/home&quot;&gt;Excelsior Pass&lt;/a&gt; based
on some &lt;a href=&quot;https://www.ibm.com/products/digital-health-pass&quot;&gt;IBM tech&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Like just about everything
surrounding COVID, there has been a huge amount of controversy
around vaccine passports (see, for instance, this
&lt;a href=&quot;https://www.eff.org/deeplinks/2020/12/vaccine-passports-stamp-inequity&quot;&gt;EFF post&lt;/a&gt;,
&lt;a href=&quot;https://www.aclu.org/news/privacy-technology/theres-a-lot-that-can-go-wrong-with-vaccine-passports/&quot;&gt;ACLU post&lt;/a&gt;,
or this &lt;a href=&quot;https://www.nytimes.com/2021/02/04/travel/coronavirus-vaccine-passports.html&quot;&gt;NYT article&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;There two seem to be four major sets of complaints:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Requiring vaccination is inherently a &lt;a href=&quot;https://www.msn.com/en-us/news/world/florida-prohibits-vaccine-passports-citing-freedom/ar-BB1fftmF&quot;&gt;threat to people&#39;s freedom&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because vaccine distribution has been unfair, with
a number of communities having trouble getting vaccines,
a requirement to get vaccinated &lt;a href=&quot;https://naturemicrobiologycommunity.nature.com/posts/how-vaccine-passports-will-worsen-inequities-in-global-health&quot;&gt;increases inequity&lt;/a&gt;
and vaccine passports enable that.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Vaccine passports might be implemented in a way that
is inaccessible for people without access to technology
(especially to smartphones).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Vaccine passports might be implemented in a way that
is a threat to user privacy and security.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I don&#39;t have anything particularly new to say about the first
two questions, which aren&#39;t really about technology but rather
about ethics and political science, so, I don&#39;t think it&#39;s that
helpful to weigh in on them, except to observe that
vaccination requirements are nothing new: it&#39;s routine to
require children to be vaccinate to go to school, people to
be vaccinated to enter certain countries, etc. That isn&#39;t to
say that this practice is without problems but merely that it&#39;s
already quite widespread, so we have a bunch of prior art here.
On the other hand, the questions of how to design a vaccine
passport system are squarely technical; the rest of this post
will be about that.&lt;/p&gt;
&lt;h2 id=&quot;what-are-we-trying-to-accomplish%3F&quot;&gt;What are we trying to accomplish? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#what-are-we-trying-to-accomplish%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As usual, we want to start by asking what we&#39;re trying to accomplish
At a high level, we have a system in which a &lt;em&gt;vaccinated person&lt;/em&gt; (VP) needs to
demonstrate to some entity (the &lt;em&gt;Relying Party (RP)&lt;/em&gt;) that they have
been vaccinated within some relevant time period. This brings with it
some security requirements:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Unforgeability&lt;/em&gt;: It should not be possible
for an unvaccinated person to persuade the RP that they
have been vaccinated.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Information minimization&lt;/em&gt;: The RP should learn as little
as possible about the VP, consistent with unforgeability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Untraceability&lt;/em&gt;: Nobody but the VP and RP should know which
RPs the VP has proven their status to.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I want to note at this point that there has been a huge amount
of emphasis on the unforgeability property, but it&#39;s fairly
unclear -- at least to me -- how important it really is. We&#39;ve
had trivially forgeable paper-based vaccination records for years
and I&#39;m not aware of any evidence of widespread fraud. However,
this seems to be something people are really concerned about -- perhaps due to how polarized
the questions of vaccination and masks have become -- and we
have already heard some reports of sales of fake vaccine
cards, so perhaps
we really do need to worry about cheating. It&#39;s certainly true
that people are talking about requiring proof of COVID vaccination in
many more settings than, for instance, proof of measles vaccination,
so there is somewhat more incentive to cheat. In any case, the
privacy requirements are a real concern.&lt;/p&gt;
&lt;p&gt;In addition, we have some functional requirements/desiderata:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The system should be cheap to bring up and operate.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It should be easy for VPs to get whatever credential they need
and to replace it if it is lost or destroyed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;VPs should not be required to have some sort of device (e.g., a smartphone).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;the-current-state&quot;&gt;The Current State &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#the-current-state&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In the US, most people who are getting vaccinated are getting paper
vaccination cards that look like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://media.cntraveler.com/photos/603805ceb643816fc1c010bb/16:9/w_2560%2Cc_limit/Covid-2021-GettyImages-1230165606-2.jpg&quot; alt=&quot;COVID Vaccination Card&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This card is a useful record that you&#39;ve been vaccinated, with which
vaccine, and when you have to come back, but it&#39;s also trivially
forgeable.  Given that they&#39;re made of paper with effectively no
anti-counterfeiting measures (not even the ones that are in currency),
it would be easy to make one yourself, and there are already people
&lt;a href=&quot;https://www.washingtonpost.com/health/2021/04/18/scams-coronavirus-vaccination-cards/&quot;&gt;selling them&lt;/a&gt; &lt;a href=&quot;https://www.cbsnews.com/news/covid-vaccination-cards-fake-scammers-fraud/&quot;&gt;online&lt;/a&gt;. As
I said above, it&#39;s not clear entirely how much we ought to worry about
fraud, but if we do, these cards aren&#39;t up to the task. In any case,
they also have suboptimal information minimization properties: it&#39;s
not necessary to know how old you are or which vaccine you got in
order to know whether you were vaccinated.&lt;/p&gt;
&lt;p&gt;The cards are pretty good on the traceability front: nobody but
you and the RP learns anything, and they&#39;re cheap to make and use,
without requiring any kind of device on the user&#39;s side. They&#39;re
not that convenient if you lose them, but given how cheap
they are to make, it&#39;s not the worst thing in the world if the
place you got vaccinated has to mail you a new one.&lt;/p&gt;
&lt;h2 id=&quot;improving-the-situation&quot;&gt;Improving The Situation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#improving-the-situation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A good place to start is to ask how to improve the paper design
to address the concerns above.&lt;/p&gt;
&lt;p&gt;The data minimization issue is actually fairly easy to address:
just don&#39;t put unnecessary information on the card: as I said,
there&#39;s no reason to have your DOB or the vaccine type on
the piece of paper you use for proof.&lt;/p&gt;
&lt;p&gt;However, it&#39;s actually not straightforward to remove your &lt;em&gt;name&lt;/em&gt;.  The
reason for this is that the RP needs to be able to determine that the
credential actually applies to you rather than to someone else. Even
if we assume that the credential is tamper-resistant (see below),
that doesn&#39;t mean it belongs to you. There are really two main
ways to address this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Have the VP&#39;s name (or some ID number) on the credential and require them to
provide a biometric credential (i.e., a photo ID) that proves
they are the right person.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Embed a biometric directly into the credential.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This should all be fairly familiar because it&#39;s exactly the same
as other situations where you prove your identity. For instance,
when you get on a plane, TSA or the airline reads your boarding
pass, which has your name, and then uses your photo ID to compare
that to your face and decide if it&#39;s really you (this is option 1).
By contrast, when you want to prove you are licensed to drive, you
present a credential that has your biometrics directly embedded
(i.e., a drivers license).&lt;/p&gt;
&lt;p&gt;This leaves us with the question of how to make the credential
tamper-resistant. There are two major approaches here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Make the credential physically tamper-resistant&lt;/li&gt;
&lt;li&gt;Make the credential digitally tamper-resistant&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;physically-tamper-resistant-credentials&quot;&gt;Physically Tamper-Resistant Credentials &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#physically-tamper-resistant-credentials&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A physically tamper-resistant credential is just one which is hard to
change or for unauthorized people to manufacture.  This usually
includes features like holograms, tamper-evident sealing (so that
you can&#39;t disassemble it without leaving traces)
etc. Most of us have lot
of experience with physically tamper-resistant credentials such as
passports, drivers licenses, etc. These generally aren&#39;t completely
impossible to forge, but they&#39;re designed to be somewhat difficult.
From a threat model perspective, this is probably fine; after all
we&#39;re not trying to make it impossible to pretend to be vaccinated,
just difficult enough that most people won&#39;t try.&lt;/p&gt;
&lt;p&gt;In principal, this kind of credential has excellent privacy because
it&#39;s read by a human RP rather than some machine. Of course, one
could take a photo of it, but there&#39;s no need to. As an analogy,
if you go to a bar and show your driver&#39;s license to prove you are
over 21, that doesn&#39;t necessarily create a digital record. Unfortunately
for privacy, increasingly those kinds of previously analog admissions processes are
actually done by scanning the credential (which usually has some
machine readable data), thus significantly reducing the privacy benefit.&lt;/p&gt;
&lt;p&gt;The main problem with a physically tamper-resistant credential
is that it&#39;s expensive to make and that by necessity you need to
limit the number of people who can make it: if it&#39;s cheap to buy
the equipment to make the credential then it will also be cheap
to forge. This is inconsistent with rapidly issuing credentials
concurrently with vaccinating people: when I got vaccinated there
were probably 25 staff checking people in and each one had
a stack of cards. It&#39;s hard to see how you would scale the production
of tamper-resistant plastic cards to an operation like this, let
alone to one that happens at doctors offices and pharmacies
all over the country. It&#39;s potentially possible that they
could report people&#39;s names to some central authority which then
makes the cards, but even then we have scaling issues, especially
if you want the cards to be available 2 weeks after vaccination.
A related problem is that if you lose the card, it&#39;s hard to replace
because you have the same issuing problem.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;digitally-tamper-resistant-credentials&quot;&gt;Digitally Tamper-Resistant Credentials &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#digitally-tamper-resistant-credentials&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The major alternative here is to design a digitally tamper-resistant
system. Effectively what this means is that the issuing authority
&lt;a href=&quot;https://en.wikipedia.org/wiki/Digital_signature&quot;&gt;digitally signs&lt;/a&gt;
a credential. This provides cryptographically strong authentication
of the data in the credential in such a way that anyone can
verify it as long as they have the right software. The credential
just needs to contain the same information as would be on the paper
credential: the fact that you were vaccinated (and potentially a
validity date) plus either your name (so you can show your
photo id) or your identity (so the RP can directly match it against
you).&lt;/p&gt;
&lt;p&gt;This design has a number of nice properties. First, it&#39;s cheap to
manufacture: you can do the signing on a smartphone app.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; It doesn&#39;t need
any special machinery from the RP: you can encode the credential
as a 2-D bar code which the VP can show on their phone or
print out. And they can make as many copies as they want, just like
your airline boarding pass.&lt;/p&gt;
&lt;p&gt;The major drawback of this design is that it requires special software
on the RP side to read the 2D bar code, verify the digital signature,
and verify the result. However, this software is relatively straightforward
to write and can run on any smartphone, using the camera to read the
bar code.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; So, while this is somewhat of a pain, it&#39;s not
that big a deal.&lt;/p&gt;
&lt;p&gt;This design also has generally good privacy properties: the
information encoded in credential is (or at least can be)
the minimal set needed to
validate that you are you and that you are vaccinated, and because
the credential can be locally verified, there&#39;s no central authority
which learns where you go. Or, at least, it&#39;s not &lt;em&gt;necessary&lt;/em&gt; for there
to be a central authority: nothing stops the RP from reporting
that you were present back to some central location, but that&#39;s
just inherent in them getting your name and picture. As far as I
know, there&#39;s no way to prevent that, though if the credential
just contains your picture rather than an identifier, it&#39;s somewhat better
(though the code itself is still unique, so you can be tracked)
especially because the RP can always capture your picture anyway.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;By this point you should be getting the impression that signed
credentials are a pretty good design, and it&#39;s no surprise that this
seems to be the design that WHO has in mind for their &lt;a href=&quot;https://cdn.who.int/media/docs/default-source/documents/interim-guidance-svc_20210319_final.pdf?sfvrsn=b95db77d_11&amp;amp;download=true&quot;&gt;smart
vaccination
certificate&lt;/a&gt;.
They seem to envision encoding quite a bit more information than is
strictly required for a &amp;quot;yes/no&amp;quot; decision and then having a &amp;quot;selective
disclosure&amp;quot; feature that would just have that information and can be
encoded in a bar code.&lt;/p&gt;
&lt;h2 id=&quot;what-about-green-pass%2C-excelsior-pass%2C-etc%3F&quot;&gt;What about Green Pass, Excelsior Pass, etc? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#what-about-green-pass%2C-excelsior-pass%2C-etc%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So what are people actually rolling out in the field? The Israeli Green Pass
seems to be basically this: a &lt;a href=&quot;https://corona.health.gov.il/en/directives/biz-ramzor-app/&quot;&gt;signed
credential&lt;/a&gt;.
It&#39;s got a QR code which you read with an app and the app then
displays the &lt;a href=&quot;https://en.wikipedia.org/wiki/National_identification_number#Israel&quot;&gt;ID
number&lt;/a&gt;
and an expiration data. You then compare the ID number to the user&#39;s
ID to verify that they are the right person.&lt;/p&gt;
&lt;p&gt;I&#39;ve had a lot of trouble figuring out what the Excelsior Pass does.
Based on the NY Excelsior Pass &lt;a href=&quot;https://covid19vaccine.health.ny.gov/excelsior-pass-frequently-asked-questions/&quot;&gt;FAQ&lt;/a&gt;, which says that &amp;quot;you can print a paper Pass, take a screen shot of your Pass, or save it to the Excelsior Pass Wallet mobile app&amp;quot;, it sounds like it&#39;s the same kind of thing as
Green Pass, but that&#39;s hardly definitive.
I&#39;ve been trying to get a copy of the specification
for this technology and will report back if I manage to learn more.&lt;/p&gt;
&lt;h2 id=&quot;what-about-the-blockchain%3F&quot;&gt;What About the Blockchain? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#what-about-the-blockchain%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Something that keeps coming up here is the use of &lt;a href=&quot;https://en.wikipedia.org/wiki/Blockchain&quot;&gt;blockchain&lt;/a&gt;
for vaccine passports. You&#39;ll notice that my description above
doesn&#39;t have anything about the blockchain but,
for instance, the Excelsior Pass says it is built on IBM&#39;s
&lt;a href=&quot;https://www.ibm.com/products/digital-health-pass&quot;&gt;digital health pass&lt;/a&gt; which
is apparently &lt;a href=&quot;https://www.ibm.com/blockchain/resources/healthcare/#section-3&quot;&gt;&amp;quot;built on IBM blockchain technology&amp;quot;&lt;/a&gt; and says &amp;quot;Protects user data so that it remains private when generating credentials. Blockchain and cryptography provide credentials that are tamper-proof and trusted.&amp;quot;
As another example, in this &lt;a href=&quot;https://www.lfph.io/cci/&quot;&gt;webinar&lt;/a&gt;
on the Linux Foundation&#39;s COVID-19 Credentials Initiative,
Kaliya Young &lt;a href=&quot;https://youtu.be/KZvbx5cRs9E?t=3153&quot;&gt;answers a question on blockchain&lt;/a&gt;
by saying that the root keys for the signers would be stored in the blockchain.&lt;/p&gt;
&lt;p&gt;To be honest, I find this all kind of puzzling; as far as I can tell
there&#39;s no useful role for the blockchain here. To oversimplify,
the major purpose of a blockchain is to arrange for global consensus about
some set of facts (for instance, the set of financial transactions that
has happened) but that&#39;s not necessary in this case: the structure
of a vaccine credential is that some health authority &lt;em&gt;asserts&lt;/em&gt; that
a given person have been vaccinated. We do need relying parties to
know the set of health authorities, but we have existing solutions for that
(at a high level, you just build the root keys into the verifying
apps).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;
If anyone has more details on why a blockchain&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
is useful
for this application I&#39;d be interested in hearing them.&lt;/p&gt;
&lt;h2 id=&quot;is-this-stuff-any-good%3F&quot;&gt;Is this stuff any good? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#is-this-stuff-any-good%3F&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s hard to tell. As discussed above, some of these designs seem to
be superficially sensible, but even if the overall design is sensible,
there are lots of ways to implement it incorrectly. It&#39;s quite concerning
not to have published specifications for the exact structure
of the credentials. Without having a
detailed specification, it&#39;s not possible to determine that it has the
claimed security and privacy properties. The protocols that run the
Web and the Internet are open which not only allows anyone to implement
them, but also to verify their security and privacy properties.
If we&#39;re going to have vaccine passports, they should be open as well.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Updated: 2021-04-02 10:10 AM to point to Mozilla&#39;s previous work on blockchain and identity.&lt;/em&gt;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Of course, you
could be issued multiple cards, as they&#39;re not transferable. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
There are some logistical issues around exactly who can sign:
you probably don&#39;t want everyone at the clinic to have a signing
key, but you can have some central signer. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Indeed, in Santa Clara County, where I got vaccinated, your
appointment confirmation is a 2D bar code which you print out and
they scan onsite. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;If you&#39;re familiar
with TLS, this is going to sound a lot like a digital certificate,
and you might wonder whether revocation is a privacy issue the way
that it is with WebPKI and OCSP. The answer is more or less &amp;quot;no&amp;quot;.
There&#39;s no real reason to revoke individual credentials and so
the only real problem is revoking signing certificates. That&#39;s
likely to happen quite infrequently, so we can either ignore it,
disseminate a certificate revocation list, or have central
status checking just for them. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Obviously, you won&#39;t be signing every credential with the
root keys, but you use those to sign some other keys, building
a chain of trust down to keys which you can use to sign the
user credentials. &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Because of the large amount of interest in blockchain
technologies, there&#39;s a tendency to try to sprinkle it
in places it doesn&#39;t help, especially in the &lt;a href=&quot;https://blog.mozilla.org/netpolicy/2020/08/06/by-embracing-blockchain-a-california-bill-takes-the-wrong-step-forward/&quot;&gt;identity&lt;/a&gt;
&lt;a href=&quot;https://twitter.com/snowjake/status/1309658817743867904&quot;&gt;space&lt;/a&gt;
For that reason, it&#39;s really important to ask what benefits it&#39;s
bringing.
 &lt;a href=&quot;https://educatedguesswork.org/posts/vaccine-passport/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Some stuff about running pacers</title>
		<link href="https://educatedguesswork.org/posts/pacers/"/>
		<updated>2021-04-20T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/pacers/</id>
		<content type="html">&lt;p&gt;Anyone who has cycled in a group or spent a few minutes watching
the Tour de France knows that drafting behind another rider
dramatically decreases the amount of effort you need to exert
in order to maintain a given speed, with the effect increasing
the faster you go. This is true to some extent
with running, though because running pace is significantly slower
even for elite runners&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;, the effect is smaller.
Nevertheless, it&#39;s not nothing and it&#39;s quite common to see
elite runners use pacemakers for big races or record attempts.
Perhaps the most famous example is Eliud Kipchoge&#39;s remarkable
&lt;a href=&quot;https://www.ineos159challenge.com/&quot;&gt;Ineos 1:59 Challenge&lt;/a&gt;
run, in which he became the first person to go under two hours
for the marathon distance (note the careful wording here;
more on this later). In that run, he used a carefully designed
pacing structure consisting of five runners ahead of him
in a reverse V and two runners side by-side behind him (that&#39;s
Kipchoge in the white shirt):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://www.ineos159challenge.com/static/media/intro.7855bfe9.png&quot; alt=&quot;Ineos 1:59 Pacing&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The precise estimates vary, but it seems likely that this
pacing technique, along with Nike&#39;s &amp;quot;super-shoe&amp;quot; carbon-plated
shoe technology, was a not-insignificant contributor to this result.&lt;/p&gt;
&lt;h2 id=&quot;pacers-for-road-racing&quot;&gt;Pacers for Road Racing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pacers/#pacers-for-road-racing&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As I said, it&#39;s common for races and record attempts to use
pacemakers (often called &amp;quot;rabbits&amp;quot;). Pacemakers serve two
primary purposes. First, they break the wind, as mentioned before.
Second, they relieve the racers of the psychological burden of
maintaining the target pace. Even at the early stages of a marathon,
running at race pace is a significant amount of work and it&#39;s
easier if you can just sit on the pacer and trust them to
run at the right speed.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Historically, pacers have been a bit controversial, on the theory
that they&#39;re not really part of the race. However, they&#39;re common
practice, and Roger Bannister famously used two to break the four
&lt;a href=&quot;https://en.wikipedia.org/wiki/Pacemaker_(running)&quot;&gt;minute mile&lt;/a&gt;.
At this point, people just remember that Bannister was the first
person to break 4:00, not that he used pacemakers.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The World Athletics &lt;a href=&quot;https://www.worldathletics.org/download/download?filename=febae412-b673-4523-8321-e1ed092421dc.pdf&amp;amp;urlslug=C2.1%20-%20Technical%20Rules&quot;&gt;rules&lt;/a&gt; (Section 6.3) prohibit pacing by
anyone &amp;quot;not participating in the same race, by athletes lapped or about to be
lapped&amp;quot;. In other words, pacers must start the race at the beginning and
can&#39;t just slow down and wait for the runner to lap them. However, because pacers generally
aren&#39;t as fast as the truly elite runners they are pacing -- else they would
be the ones going for the record with someone else pacing them --
what this mostly means in practice is that the pacer runs some of the race
at the target pace and then drops out, with the runner going for
the win or the record attempt having to finish the race on their own,
often running a very substantial part of the race alone.
&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
In the Ineos 1:59 challenge, by contrast, Kipchoge had pacers rotating
in and out of the event so that he could be paced almost all the way
to the finish. This is an obvious advantage, especially with the
special pacing formation, and is one of the reasons why this doesn&#39;t
count as an official world record (the official record, also held
by Kipchoge, is an amazing 2:01:39).&lt;/p&gt;
&lt;p&gt;Another, potentially less obvious, result of this rule is that things
are different for women. In track events, women and men compete separately,
but in road races such as marathons it&#39;s reasonably common for men and
women to compete together. However, because men&#39;s distance records
are approximately 10% faster than women&#39;s across the board
(for example, the women&#39;s marathon world record is 2:14:02,
over 12 minutes slower than the men&#39;s record), it&#39;s quite possible
to find male pacemakers who can run the whole race at the target
women&#39;s pace.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.
For this reason, there are separate world records for female-only
and mixed-gender marathons with the women&#39;s only record of 2:17:01 being almost
3 minutes slower than the mixed-gender records 2:14:04 (see the Wikipedia article &lt;a href=&quot;https://en.wikipedia.org/wiki/Marathon_world_record_progression&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Of course, outside of elite competition, it&#39;s easy to find people who can
run the whole race at a given pace, and often marathons will
have pacemaking volunteers tasked to run at specific finish
times to make it easier for runners to hit those paces without
having to pace the race themselves; they just have to follow
the pace group.&lt;/p&gt;
&lt;h2 id=&quot;pacers-for-ultramarathons&quot;&gt;Pacers for Ultramarathons &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/pacers/#pacers-for-ultramarathons&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Trail ultramarathons (say 50 miles and up) will often allow
what they call &amp;quot;pacers&amp;quot; but the situation is rather different.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;
Instead of being required to start the race with runners,
pacers are usually only allowed to pick up their runners
well into the race. For instance, at Western States
pacers are &lt;a href=&quot;https://www.wser.org/pacer-rules/&quot;&gt;allowed after mile 62&lt;/a&gt;.
Moreover, you&#39;re allowed to have rotating pacers. It&#39;s pretty
common to have friends pace you, and because at this point
the runner is fairly tired, it&#39;s usually pretty easy to find someone
who can keep the pace.&lt;/p&gt;
&lt;p&gt;This makes a certain amount of sense.  Paces in these races are
relatively slow. For instance, Jim Walmsley&#39;s JFK 50 record of 5:21 is
6:25/mile (3:59/km) pace and his Western States 100 record of 14:09 is
about 8:29/mile (5:17/km), and so the effect of drafting is
correspondingly less. Instead, the primary purpose of pacers is moral
support: because the race is so long and you&#39;re so tired towards the
end, it&#39;s extremely helpful -- or so I hear -- to have someone
to keep you company and help you stay on track. In addition, US
ultras typically start in the morning, which means that your
pacer will often be accompanying you for the section of the
race you run in the dark&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;, so it&#39;s arguably also an aid
to safety to have someone with you through the somewhat
more dangerous dark portions in case you fall, get attacked by
a mountain lion, or whatever.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
World record marathon pace is approximately 21 km/h, but
most recreational runners cannot maintain that pace for
even a mile. By contrast, most recreational cyclists
can easily maintain 21 km/h. &lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
For track events, there is now a tool called
&lt;a href=&quot;https://en.wikipedia.org/wiki/Wavelight&quot;&gt;Wavelight&lt;/a&gt;
which shows lights on the track to indicate the
right pace. You can see it in use
when Joshua Cheptegai broke the great Kenenisa Bekele&#39;s
5K world record &lt;a href=&quot;https://www.youtube.com/watch?v=b01dG9v9LCY&quot;&gt;here&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Thanks to Lisa Donchak for this observation. &lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Though there are some famous cases of pacers sticking it out and
winning the race. &lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;At the 2020 US Olympic Trials in Atlanta, 16
men went under 2:14 on a relatively difficult course. &lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
At least as far as world records go, it seems that road
ultramarathons have the same rules as road racing. For
instance in Jim Walmsley&#39;s &lt;a href=&quot;https://www.youtube.com/watch?v=-8Tzynp-cqs&quot;&gt;attempt&lt;/a&gt;
on the 100K world record, he had pacers start with him but
they gradually dropped off and he had to run much of the
race alone. &lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Practically nobody but the elites finishes
a 100 mile race before dark. &lt;a href=&quot;https://educatedguesswork.org/posts/pacers/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Addressing Supply Chain Vulnerabilities</title>
		<link href="https://educatedguesswork.org/posts/supply-chain/"/>
		<updated>2021-02-27T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/supply-chain/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;One of the unsung achievements of modern software development is the
degree to which it has become componentized: not that long ago, when
you wanted to write a piece of software you had to write pretty much
the whole thing using whatever tools were provided by the language you
were writing in, maybe with a few specialized libraries like
&lt;a href=&quot;https://www.openssl.org/&quot;&gt;OpenSSL&lt;/a&gt;. No longer. The combination
of newer languages, Open Source development and easy-to-use package management
systems like JavaScript&#39;s &lt;a href=&quot;https://www.npmjs.com/&quot;&gt;npm&lt;/a&gt; or Rust&#39;s
&lt;a href=&quot;https://crates.io/&quot;&gt;Cargo/crates.io&lt;/a&gt; has revolutionized how people
write software, making it standard practice to pull in third party
libraries even for the &lt;a href=&quot;https://www.npmjs.com/package/left-pad&quot;&gt;simplest tasks&lt;/a&gt;;
it&#39;s not at all uncommon for programs to depend on hundreds or thousands
of third party packages.&lt;/p&gt;
&lt;h1 id=&quot;supply-chain-attacks&quot;&gt;Supply Chain Attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supply-chain/#supply-chain-attacks&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;While this new paradigm has revolutionized software development, it
has also greatly increased the risk of supply chain attacks, in which
an attacker compromises one of your dependencies and through that your
software.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; A famous example of this is provided by
the 2018 &lt;a href=&quot;https://www.theregister.com/2018/11/26/npm_repo_bitcoin_stealer/&quot;&gt;compromise&lt;/a&gt;
of the &lt;code&gt;event-stream&lt;/code&gt; package to steal Bitcoin from people&#39;s
computers. The Register&#39;s brief history provides a sense of the
scale of the problem:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ayrton Sparling, a computer science student at California State
University, Fullerton (FallingSnow on GitHub), flagged the problem
last week in a GitHub issues post. According to Sparling, a commit to
the event-stream module added flatmap-stream as a dependency, which
then included injection code targeting another package, ps-tree.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There are a number of ways in which an attacker might manage to
inject malware into a package. In this case, what seems to have
happened is that the original maintainer of event-stream was
no longer working on it and someone else volunteered to take
it over. Normally, that would be great, but here it seems that
volunteer was malicious, so it&#39;s not great.&lt;/p&gt;
&lt;h1 id=&quot;standards-for-critical-packages&quot;&gt;Standards for Critical Packages &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supply-chain/#standards-for-critical-packages&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Recently, Eric Brewer, Rob Pike, Abhishek Arya, Anne Bertucio and Kim Lewandowski
posted a &lt;a href=&quot;https://security.googleblog.com/2021/02/know-prevent-fix-framework-for-shifting.html&quot;&gt;proposal&lt;/a&gt;
on the Google security blog
for addressing vulnerabilities in Open Source software.
They
cover a number of issues including vulnerability management
and security of compilation, and there&#39;s a lot of good stuff
here, but the part that has received
the most attention is the suggestion that certain packages
should be designated &amp;quot;critical&amp;quot;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For software that is critical to security, we need to agree on
development processes that ensure sufficient review, avoid
unilateral changes, and transparently lead to well-defined,
verifiable official versions.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These are good development practices, and ones we follow here at Mozilla,
so I certainly encourage people to adopt them. However, trying to
require them for critical software seems like it will have some
problems.&lt;/p&gt;
&lt;h2 id=&quot;it-creates-friction-for-the-package-developer&quot;&gt;It creates friction for the package developer &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supply-chain/#it-creates-friction-for-the-package-developer&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the real benefits of this new model of software development is
that it&#39;s low friction: it&#39;s easy to develop a library and make it
available -- you just write it put it up on a package repository like
&lt;a href=&quot;http://crates.io/&quot;&gt;crates.io&lt;/a&gt; -- and it&#39;s easy to use those packages -- you just add them
to your build configuration. But then you&#39;re successful and suddenly
your package &lt;em&gt;is&lt;/em&gt; widely used and gets deemed &amp;quot;critical&amp;quot; and now you
have to put in place all kinds of new practices. It probably would be
better if you did this, but what if you don&#39;t? At this point your
package is widely used -- or it wouldn&#39;t be critical -- so what now?&lt;/p&gt;
&lt;h2 id=&quot;it&#39;s-not-enough&quot;&gt;It&#39;s not enough &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supply-chain/#it&#39;s-not-enough&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Even packages which are well maintained and have good development
practices routinely have vulnerabilities. For example, Firefox
recently released a new &lt;a href=&quot;https://www.mozilla.org/en-US/firefox/85.0.1/releasenotes/&quot;&gt;version&lt;/a&gt;
that fixed a &lt;a href=&quot;https://www.mozilla.org/en-US/security/advisories/mfsa2021-06/&quot;&gt;vulnerability&lt;/a&gt;
in the popular &lt;a href=&quot;https://chromium.googlesource.com/angle/angle&quot;&gt;ANGLE&lt;/a&gt;
graphics engine, which is maintained by Google. Both Mozilla
and Google follow the practices that this blog post recommends, but
it&#39;s just the case that people make mistakes. To (possibly mis)quote
&lt;a href=&quot;https://www.cs.columbia.edu/~smb/&quot;&gt;Steve Bellovin&lt;/a&gt;,
&amp;quot;Software has bugs. Security-relevant software has security-relevant bugs&amp;quot;.
So, while these practices are important to reduce the risk of
vulnerabilities, we know they can&#39;t eliminate them.&lt;/p&gt;
&lt;p&gt;Of course this applies to inadvertant vulnerabilities, but what about
malicious actors (though note that Brewer et al. observe that
&amp;quot;Taking a step back, although supply-chain attacks are a risk, the
vast majority of vulnerabilities are mundane and unintentional—honest
errors made by well-intentioned developers.&amp;quot;)? It&#39;s possible that
some of their proposed changes (in particular forbidding anonymous
authors) might have an impact here, but it&#39;s really hard to see how
this is actionable. What&#39;s the standard for not being anonymous?
That you have an e-mail address? A Web page? A &lt;a href=&quot;https://www.dnb.com/duns-number.html&quot;&gt;DUNS number&lt;/a&gt;?&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
None of these seem particularly difficult for a dedicated attacker
to fake and of course the more strict you make the requirements
the more it&#39;s a burden for the (vast majority) of legitimate
developers.&lt;/p&gt;
&lt;p&gt;I do want to acknowledge at this point that Brewer et al. clearly state
that multiple layers of protection needed and that it&#39;s necessary
to have robust mechanisms for handling vulnerability defenses. I agree
with all that, I&#39;m just less certain about this particular piece.&lt;/p&gt;
&lt;h1 id=&quot;redefining-critical&quot;&gt;Redefining Critical &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supply-chain/#redefining-critical&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Part of the difficulty here is that there are ways in which
a piece of software can be &amp;quot;critical&amp;quot;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It can do something which is inherently security sensitive
(e.g., the OpenSSL SSL/TLS stack which is responsible for
securing a huge fraction of Internet traffic).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It can be widely used (e.g., the Rust &lt;a href=&quot;https://crates.io/crates/log&quot;&gt;log&lt;/a&gt;)
crate, but not inherently that sensitive.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The vast majority of packages -- widely used or not -- fall
into the second category: they do something important but
that isn&#39;t security critical. Unfortunately, because of
the way that software is generally built, this doesn&#39;t matter:
even when software is built out of a pile of small components,
when they&#39;re packaged up into a single program, each component
has all the privileges that that program has. So, for instance,
suppose you include a component for doing statistical
calculations: if that component is compromised nothing stops it
from opening up files on your disk and stealing your
passwords or Bitcoins or whatever. This is true whether the
compromise is due to an inadvertant vulnerability or malware injected
into the package: a problem in any component compromises the
whole system.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
Indeed, minor non-security components
make attractive targets because they may not have had as
much scrutiny as high profile security components.&lt;/p&gt;
&lt;h1 id=&quot;least-privilege-in-practice%3A-better-sandboxing&quot;&gt;Least Privilege in Practice: Better Sandboxing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supply-chain/#least-privilege-in-practice%3A-better-sandboxing&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;When looked at from this perspective, it&#39;s clear that we
have a technology problem: There&#39;s no good reason for individual
components to have this much power. Rather, they should
only have the capabilities they need to do the job they
are intended to to (the technical term is &lt;a href=&quot;https://en.wikipedia.org/wiki/Principle_of_least_privilege&quot;&gt;least privilege&lt;/a&gt;);
it&#39;s just that the software tools we have don&#39;t do a good
job of providing this property. This is a situation
which has long been recognized in complicated pieces of
software like Web browsers, which employ a technique called
&amp;quot;process sandboxing&amp;quot; (pioneered by Chrome) in which the
code that interacts with the Web site is run in its own
&amp;quot;sandbox&amp;quot; and has limited abilities to interact with your
computer. When it wants to do something that it&#39;s not allowed
to do, it talks to the main Web browser code and asks it to
do it for it, thus allowing that code to enforce the rules
without being exposed to vulnerabilities in the rest of
the browser.&lt;/p&gt;
&lt;p&gt;Process sandboxing is an important and powerful tool, but
it&#39;s a heavyweight one; it&#39;s not practical to separate out
every subcomponent of a large program into its own process.
The good news is that there are several recent technologies
which do allow this kind of fine-grained sandboxing, both
based on &lt;a href=&quot;https://webassembly.org/&quot;&gt;WebAssembly&lt;/a&gt;. For WebAssembly
programs, &lt;a href=&quot;https://hacks.mozilla.org/2019/11/announcing-the-bytecode-alliance/&quot;&gt;nanoprocesses&lt;/a&gt;
allow individual components to run in their own sandbox
with component-specific access control lists. More recently,
we have been &lt;a href=&quot;https://hacks.mozilla.org/2020/02/securing-firefox-with-webassembly/&quot;&gt;experimenting&lt;/a&gt;
with a technology called called &lt;a href=&quot;https://rlbox.dev/&quot;&gt;RLBox&lt;/a&gt; developed
by researchers at UCSD, UT Austin, and Stanford which allows
regular programs such as Firefox to run sandboxed components.
The basic idea behind both of these is the same: use
static compilation techniques to ensure that the component
is memory-safe (i.e., cannot reach outside of itself
to touch other parts of the program) and then give it only
the capabilities it needs to do its job.&lt;/p&gt;
&lt;p&gt;Techniques like this point the way to a scalable technical
approach for protecting yourself from third party components: each component is
isolated in its own sandbox and comes with a list of the
capabilities that it needs (often called a &lt;em&gt;manifest&lt;/em&gt;)
with the compiler enforcing that it has no other capabilities
(this is not too dissimilar from -- but much more granular than -- the permissions that mobile
applications request). This makes the problem of including a new
component much simpler because you can just look at the capabilities
it requests, without needing verify that the code itself is behaving correctly.&lt;/p&gt;
&lt;h1 id=&quot;making-auditing-easier&quot;&gt;Making Auditing Easier &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supply-chain/#making-auditing-easier&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;While powerful, sandboxing itself -- whether
of the traditional process or WebAssembly variety --
isn&#39;t enough, for two reasons.
First, the APIs that we have to work with aren&#39;t sufficiently
fine-grained.  Consider the case of a component which is
designed to let you open and process files on the disk;
this necessarily needs to be able to open files, but what
stops it from reading your Bitcoins instead of the files
that the programmer wanted it to read? It might be possible
to create a capability list that includes just reading certain
files, but that&#39;s not the API the operating system gives
you, so now we need to invent something. There are a lot of
cases like this, so things get complicated.&lt;/p&gt;
&lt;p&gt;The second reason is that some components are critical
because they perform critical functions. For instance, no matter
how much you sandbox OpenSSL, you still have to worry about
the fact that it&#39;s handling your sensitive data, and so if
compromised it might leak that. Fortunately, this class
of critical components is smaller, but it&#39;s non-zero.&lt;/p&gt;
&lt;p&gt;This isn&#39;t to say that sandboxing isn&#39;t useful, merely that it&#39;s
insufficient. What we need is multiple layers of protection&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;, with the first
layer being procedural mechanisms to defend against code
being compromised and the second layer being fine-grained sandboxing to
contain the impact of compromise. As noted earlier, it seems problematic to put the burden of
better processes on the developer of the component, especially
when there are a large number of dependent projects, many of
them very well funded.&lt;/p&gt;
&lt;p&gt;Something we have been looking at internally at Mozilla is a way for
those projects to tag the dependencies they use and depend on. The
way that this would work is that each project would then be
tagged with a set of other projects which used it (e.g., &amp;quot;Firefox
uses this crate&amp;quot;). Then when you are considering using a component
you could look to see who else uses it, which gives you some
measure of confidence. Of course, you don&#39;t know what sort of
auditing those organizations do, but if you know that Project X
is very security conscious and they use component Y, that should
give you some level of confidence. This is really just a automating
something that already happens informally: people judge components by
who else uses them. There are some obvious extensions here, for
instance labelling specific versions, having indications of
what kind of auditing the depending project did, or allowing
people to configure their build systems to automatically
trust projects vouched for by some set of other projects and
refuse to include unvouched projects, maintaining a database
of insecure versions (this is something the Brewer et al. proposal
suggests too).
The advantage of this kind of
approach is that it puts the burden on the people benefitting
from a project, rather than having some widely used project
suddenly subject to a whole pile of new requirements which
they may not be interested in meeting. This work is still
in the exploratory stages, so &lt;a href=&quot;mailto:ekr-blog@mozilla.com&quot;&gt;reach out to me&lt;/a&gt;
if you&#39;re interested.&lt;/p&gt;
&lt;p&gt;Obviously, this only works if people actually do &lt;em&gt;some&lt;/em&gt; kind
of due diligence prior to depending on a component. Here at
Mozilla, we do that to some extent, though it&#39;s not really
practical to review every line of code in a giant package
like &lt;a href=&quot;https://webrtc.googlesource.com/src/&quot;&gt;WebRTC&lt;/a&gt; There is some hope here as well: because
modern languages such as Rust or Go are memory safe, it&#39;s
much easier to convince yourself that certain behaviors
are impossible -- even if the program has a defect -- which
makes it easier to audit.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt; Here too it&#39;s possible
to have clear manifests that describe what capabilities the
program needs and verify (after some work) that those are accurate.&lt;/p&gt;
&lt;h1 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/supply-chain/#summary&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;I&#39;ve focused here on the differences here with what Google is
proposing, but in general, I think they&#39;re right to be worried
about this kind of attack. It&#39;s very convenient to be able to
build on other people&#39;s work, but the difficulty of ascertaining
the quality of that work is an enormous problem&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;.
Fortunately, we&#39;re seeing a whole series of technological
advancements that point the way to a solution without
having to go back to the bad old days of writing everything yourself.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Supply chain attacks can be mounted via a number
of other mechanisms, but in this post, we are going to focus
on this threat vector. &lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Where &amp;quot;critical&amp;quot; is defined by a somewhat complicated
&lt;a href=&quot;https://github.com/ossf/criticality_score&quot;&gt;formula&lt;/a&gt;
based roughly on the age of the project, how actively
maintained it seems to be, how many other projects
seem to use it, etc. It&#39;s actually not clear to me
that this is metric is that good a predictor of criticality;
it seems mostly to have the advantage that it&#39;s possible to
evaluate purely by looking at the code repository,
but presumably one could develop a metric that would
be good. &lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Experience with TLS Extended Validation certificates, which
attempt to verify company identity, &lt;a href=&quot;https://www.cyberscoop.com/easy-fake-extended-validation-certificates-research-shows/&quot;&gt;suggests&lt;/a&gt;
that this level of identity is straightforward to fake. &lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://commerce.net/people/allan-m-schiffman/&quot;&gt;Allan Schiffman&lt;/a&gt; used to call this
phenomenen a &amp;quot;distributed single point of failure&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The technical term here is
&lt;a href=&quot;https://en.wikipedia.org/wiki/Defence_in_depth&quot;&gt;defense in
depth&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Even better are verifiable systems
such the &lt;a href=&quot;https://hacl-star.github.io/&quot;&gt;HaCl*&lt;/a&gt; cryptographic
library that Firefox depends on. HaCl* comes with a machine-checkable
proof of correctness, which significantly reducing the need
to audit all the code. Right now it&#39;s only practical to do this
kind of verification for relatively small programs, in large
part because describing the specification that you are
proving the program conforms to is hard, but the
technology is rapidly getting better. &lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is
true even for basic quality reasons. Which of the
&lt;a href=&quot;https://www.npmjs.com/search?q=keywords:orm&quot;&gt;two thousand ORMs&lt;/a&gt;
for node is the best one to use? &lt;a href=&quot;https://educatedguesswork.org/posts/supply-chain/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>What WebRTC means for you</title>
		<link href="https://educatedguesswork.org/posts/webrtc/"/>
		<updated>2021-01-31T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/webrtc/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If I told you that two weeks ago IETF and W3C finally published the
standards for WebRTC, your response would probably be to ask what all
those acronyms were. Read on to find out!&lt;/p&gt;
&lt;p&gt;Widely available high quality videoconferencing is one of the real
successes of the Internet. The idea of videoconferencing is of course old
(go watch that &lt;a href=&quot;https://www.youtube.com/watch?v=ZXokqxBQsFM&quot;&gt;scene&lt;/a&gt; in
2001 where Heywood Floyd makes a video call to his family on a
Bell videophone), but until fairly recently it required specialized
equipment or at least downloading specialized software. Simply put,
WebRTC is videoconferencing (VC) in a Web browser, with no download: you
just go to a Web site and make a call. Most of the major
VC services have a WebRTC version: this includes Google
Meet, Cisco WebEx, and Microsoft Teams, plus a whole bunch
of smaller players.&lt;/p&gt;
&lt;h1 id=&quot;a-toolkit%2C-not-a-phone&quot;&gt;A toolkit, not a phone &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webrtc/#a-toolkit%2C-not-a-phone&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;WebRTC isn&#39;t a complete videoconferencing system; it&#39;s a set
of tools built in to the browser that take care of many of
the hard pieces of building a VC system so that you don&#39;t
have to. This includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Capturing the audio and video from the computer&#39;s
microphone and camera. This also includes
what&#39;s called &lt;em&gt;Acoustic Echo Cancellation&lt;/em&gt;: removing
echos (hopefully) even when people don&#39;t wear
headphones.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Allowing the two endpoints to negotiate their capabilities
(e.g., &amp;quot;I want to send and receive video at 1080p using
the AV1 codec&amp;quot;) and
arrive at a common set of parameters.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Establishing a secure connection between you and other
people on the call. This includes getting your data
through any NATs or firewalls that may be on your
network.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Compressing the audio and video for transmission to
the other side and then reassembling it on receipt.
It&#39;s also necessary to deal with situations where
some of the data is lost, in which case you want
to avoid having the picture freeze or hearing
audio glitches.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This functionality is embedded in what&#39;s called an &lt;em&gt;application
programming interface&lt;/em&gt; (API): a set of commands that the programmer
can give the browser to get it to set up a video call. The upshot of
this is that it&#39;s possible to write a very basic
VC system in a &lt;a href=&quot;https://webrtc.github.io/samples/src/content/peerconnection/pc1/&quot;&gt;very small number of lines of code&lt;/a&gt;.
Building a production system is
more work, but with WebRTC, the browser does much of the
work of building the client side for you.&lt;/p&gt;
&lt;h1 id=&quot;standardization&quot;&gt;Standardization &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webrtc/#standardization&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Importantly, this functionality is all standardized: the API itself
was published and by the &lt;em&gt;World Wide Web Consortium&lt;/em&gt;(W3C) and the
network protocols (encryption, compression, NAT traversal, etc.) were
standardized by the &lt;em&gt;Internet Engineering Task Force&lt;/em&gt; (IETF). The
result is a giant pile of specifications, including the &lt;a href=&quot;https://www.w3.org/TR/webrtc/&quot;&gt;API
specification&lt;/a&gt;, the
&lt;a href=&quot;https://tools.ietf.org/html/rfc8829&quot;&gt;protocol&lt;/a&gt; for negotiating what
media will be sent or received, and a &lt;a href=&quot;https://tools.ietf.org/html/rfc8831&quot;&gt;mechanism&lt;/a&gt;
for sending peer-to-peer data. All in all, this represents
a huge amount of work by too many people to count
spanning a decade and resulting in hundreds of pages
of specifications.&lt;/p&gt;
&lt;p&gt;The result is that it&#39;s
possible to build a VC system that will work for everyone right
in their browser and without them having to install
any software&lt;/p&gt;
&lt;p&gt;Ironically, the actual publication of the standards is kind of
anticlimactic: every major browser has been shipping WebRTC for
years and as I mentioned above, there are a large number
of WebRTC VC systems. This is a good thing: widespread deployment
is the only way to get confidence that technologies really work
as expected and that the documents are clear enough to implement
from. What the standards reflect is the collective
judgement of the technical community that we have a system
which generally works and that we&#39;re not going to change the
basic pieces. It also means that it&#39;s time for VC providers
who implemented non-standard mechanisms to update to
what the standards say&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/webrtc/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h1 id=&quot;why-do-you-care-about-any-of-this%3F&quot;&gt;Why do you care about any of this? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webrtc/#why-do-you-care-about-any-of-this%3F&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;At this point you might be thinking &amp;quot;OK, you all did a lot of work, but
why does it matter? Can&#39;t I just download Zoom?
There are a number of important reasons why WebRTC is a big deal,
as described below.&lt;/p&gt;
&lt;h3 id=&quot;security&quot;&gt;Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webrtc/#security&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Probably the most important reason is &lt;em&gt;security&lt;/em&gt;. Because WebRTC
runs entirely in the browser, it means that you don&#39;t need to
worry about security issues in the software that the VC provider
wants you to download. As an example, last year Zoom had a number
of high profile security flaws that would, for instance, have
allowed web sites to &lt;a href=&quot;https://medium.com/bugbountywriteup/zoom-zero-day-4-million-webcams-maybe-an-rce-just-get-them-to-visit-your-website-ac75c83f4ef5&quot;&gt;add you to calls without your permission&lt;/a&gt;,
or mount what&#39;s called a &lt;em&gt;Remote Code Execution&lt;/em&gt; attack
that would allow attackers to
&lt;a href=&quot;https://blog.assetnote.io/bug-bounty/2019/07/17/rce-on-zoom/&quot;&gt;run their code on your computer&lt;/a&gt;.
By contrast, because WebRTC doesn&#39;t require a download, you&#39;re
not exposed to whatever vulnerabilities the vendor may have in
their client. Of course browsers don&#39;t have a perfect
security record, but every major browser invests a huge amount
in security technologies like &lt;a href=&quot;https://wiki.mozilla.org/Security/Sandbox&quot;&gt;sandboxing&lt;/a&gt;.
Moreover, you&#39;re
already running a browser, so every additional application you
run increases your security risk. For this reason, Kaspersky
&lt;a href=&quot;https://www.kaspersky.com/blog/zoom-security-ten-tips/34729/&quot;&gt;recommends&lt;/a&gt;
running the Zoom Web client, even though the experience is a lot worse than the
app.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/webrtc/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The second security advantage of WebRTC-based conferencing is that the
browser controls access to the camera and microphone. This means that
you can easily prevent sites from using them, as well as be sure when
they are in use. For instance, Firefox prompts you before letting a site
use the camera and microphone and then shows something in the URL
bar whenever they are live.&lt;/p&gt;
&lt;p&gt;WebRTC is always encrypted in transit without the VC system
having to do anything else, so you mostly don&#39;t have to ask whether the
vendor has done a good job with their encryption. This is one of the
pieces of WebRTC that Mozilla was most involved in putting into place,
in line with &lt;a href=&quot;https://www.mozilla.org/en-US/about/manifesto/&quot;&gt;Mozilla
Manifesto&lt;/a&gt; principle
number 4 (Individuals’ security and privacy on the internet are fundamental and must not be treated as optional.).
Even more exciting, we&#39;re starting to see work on built-in
end-to-end encrypted conferencing for WebRTC built on
&lt;a href=&quot;https://datatracker.ietf.org/wg/mls/about/&quot;&gt;MLS&lt;/a&gt; and
&lt;a href=&quot;https://datatracker.ietf.org/wg/sframe/about/&quot;&gt;SFrame&lt;/a&gt;. This will
help address the one major security feature that some native clients
have that WebRTC does not provide: preventing
the service from listening in on your calls. It&#39;s good to see
progress on that front.&lt;/p&gt;
&lt;h3 id=&quot;low-friction&quot;&gt;Low Friction &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webrtc/#low-friction&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because WebRTC-based video calling apps work out of the box with
a standard Web browser, they dramatically reduce friction.
For users, this means they can just join a call without having
to install anything, which makes life a lot easier. I&#39;ve been
on plenty of calls where someone couldn&#39;t join -- often because
their company used a different VC system -- because they
hadn&#39;t downloaded the right software, and this happens a lot
less now that it just works with your browser. This can be an
even bigger issue in enterprises have restrictions on what
software can be installed.&lt;/p&gt;
&lt;p&gt;For people who want to stand up a new VC service, WebRTC
means that they don&#39;t need to write a new piece of client
software and get people to download it. This makes it much
easier to enter the market without having to worry about
users being locked into one VC system and unable to use
yours.&lt;/p&gt;
&lt;p&gt;None of this means that you can&#39;t build your own client
and a number of popular systems such as WebEx and Meet
have downloadable endpoints (or, in the case of WebEx,
hardware devices you can buy). But it means you don&#39;t
have to, and if you do things right, browser users will
be able to talk to your custom endpoints, thus giving
casual users an easy way to try out your service without being
too committed.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/webrtc/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3 id=&quot;enhancing-the-web&quot;&gt;Enhancing The Web &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webrtc/#enhancing-the-web&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Because WebRTC is part of the Web, not isolated into a separate
app, that means that it can be used not just for conferencing
applications but to enhance the Web itself. You want to add
an audio stream to your game? Share your screen in a webinar?
Upload video from your camera? No problem, just use WebRTC.&lt;/p&gt;
&lt;p&gt;One exciting thing about WebRTC is that there turn out to be
a lot of Web applications that can use WebRTC besides
just video calling. Probably the most
interesting is the use of WebRTC &amp;quot;Data Channels&amp;quot;, which allow
a pair of clients to set up a connection between them which
they can use to directly exchange data. This has a number
of interesting applications, including &lt;a href=&quot;https://www.realtimecommunicationsworld.com/webrtc-and-gaming/&quot;&gt;gaming&lt;/a&gt;,
&lt;a href=&quot;https://www.sharedrop.io/&quot;&gt;file transfer&lt;/a&gt;,
and even &lt;a href=&quot;https://webtorrent.io/desktop/&quot;&gt;BitTorrent in the browser&lt;/a&gt;.
It&#39;s still early days, but I think we&#39;re going to be seeing
a lot of DataChannels in the future.&lt;/p&gt;
&lt;!-- ### TODO: [I felt like I had another point, but I can&#39;t remember it now]--&gt;
&lt;h1 id=&quot;the-bigger-picture&quot;&gt;The bigger picture &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webrtc/#the-bigger-picture&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;By itself, WebRTC is a big step forward for the Web: it If
you&#39;d told people 20 years ago that they would be doing
video calling from their browser, they would have laughed
at you -- and I have to admit, I was initially skeptical --
and yet I do that almost every day at work. But more importantly,
it&#39;s a great example of the power the Web has to make
to make people&#39;s lives better and of what we can do when
we work together to do that.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Technical note: probably the biggest source of
problems for Firefox users is people who implemented a Chrome-specific mechanism
for handling multiple media streams called &amp;quot;Plan B&amp;quot;.
The IETF eventually went with something called
&amp;quot;Unified Plan&amp;quot; and Chrome supports it (as does
Google Meet) but there are still a number of services,
such as Slack and Facebook Video Calling, which
do Plan B only which means they don&#39;t work properly with Firefox, which
implemented Unified Plan. &lt;a href=&quot;https://educatedguesswork.org/posts/webrtc/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
The Zoom Web client is an interesting case in that it&#39;s
only partly WebRTC. Unlike (say) Google Meet, Zoom Web
uses WebRTC to capture audio and video and to transmit
media over the network, but does all the audio and video
locally using &lt;a href=&quot;https://webassembly.org/&quot;&gt;WebAssembly&lt;/a&gt;.
It&#39;s a testament to the power of WebAssembly that
this works at all, but a head-to-head comparison
of Zoom Web to other clients such as Meet or Jitsi
reveals the advantages of using the WebRTC APIs
built into the browser. &lt;a href=&quot;https://educatedguesswork.org/posts/webrtc/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Google has
open sourced their &lt;a href=&quot;https://webrtc.googlesource.com/src/&quot;&gt;WebRTC stack&lt;/a&gt;,
which makes it easier to write your own downloadable
client, including one which will interoperate with
browsers. &lt;a href=&quot;https://educatedguesswork.org/posts/webrtc/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Why getting voting right is hard, Part V: DREs (spoiler: they&#39;re bad)</title>
		<link href="https://educatedguesswork.org/posts/voting-dre/"/>
		<updated>2021-01-26T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/voting-dre/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is the fifth post in my series on voting systems (catch up on
parts
&lt;a href=&quot;https://blog.mozilla.org/blog/2020/12/08/why-getting-voting-right-is-hard-part-i-introduction-and-requirements/&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://blog.mozilla.org/blog/2020/12/14/why-getting-voting-right-is-hard-part-ii-hand-counted-paper-ballots/&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://blog.mozilla.org/blog/2021/01/05/why-getting-voting-right-is-hard-part-iii-optical-scan/&quot;&gt;III&lt;/a&gt;
and
&lt;a href=&quot;https://blog.mozilla.org/blog/2021/01/13/why-getting-voting-right-is-hard-part-iv-absentee-voting-and-vote-by-mail/&quot;&gt;IV&lt;/a&gt;),
focusing on computerized voting machines. The technical term
for these is &lt;em&gt;Direct Recording Electronic&lt;/em&gt; (DRE) voting systems, but in practice
what this means is that you vote on some kind of computer, typically
using a touch screen interface. As with precinct-count
optical scan, the machine produces a total count,
typically recorded on a memory card, printed out on a paper receipt-like tape, or
both. These can be sent back to election headquarters, together with the
ballots, where they are aggregated.&lt;/p&gt;
&lt;h1 id=&quot;accessibility&quot;&gt;Accessibility &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#accessibility&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;One of the major selling points of DREs is accessibility: paper ballots
are difficult for people with a number of disabilities to access without
assistance. At least in principle DREs can be made more accessible, for instance
fitted with audio interfaces, sip-puff devices, etc. Another advantage
of DREs is that they scale better to multiple languages: you of
course still have to encode ballot definitions in each new language,
but you don&#39;t need to worry about whether you&#39;ve printed enough ballot
in any given language&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;In practice, the accessibility of DREs is [not that great]&lt;a href=&quot;https://www.theguardian.com/us-news/2019/jul/12/2020-election-voting-security-disabled-access-ballots-machines&quot;&gt;https://www.theguardian.com/us-news/2019/jul/12/2020-election-voting-security-disabled-access-ballots-machines&lt;/a&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Noel Runyan is one of the few people who sits at the crossroads of
this debate. He has 50 years of experience designing accessible
systems and is both a computer scientist and disabled. He was dragged
into this debate, he said, because there were so few other people who
had a stake in both fields.

Voting machines for all is clearly not the right position, Runyan
said. But neither is the universal requirement for hand-marked paper
ballots.

“The [Americans with Disabilities Act], Hava and decency require that
we allow disabled people to vote and have accessible voting systems,”
Runyan said.

Yet Runyan also believes the voting machines on the market today are
“garbage”. They neither provide any real sense of security against
physical or cyber-attacks that could alter an election, nor do they
have good user interfaces for voters regardless of disability status.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;See also the 2007 California Top-to-Bottom-Review &lt;a href=&quot;https://votingsystems.cdn.sos.ca.gov/oversight/ttbr/accessibility-review-report-california-ttb-absolute-final-version16.pdf&quot;&gt;accessibility report&lt;/a&gt; for a long catalog of the failings of accessible voting
systems at the time, which don&#39;t seem to have improved much. With all
that said, having &lt;em&gt;any&lt;/em&gt; kind of accessiblity is a pretty big improvement.
In particular, this was the first time that many sight impaired voters
were able to vote without assistance.&lt;/p&gt;
&lt;h1 id=&quot;destroyingclarifying-voter-intent&quot;&gt;&lt;s&gt;Destroying&lt;/s&gt;Clarifying Voter Intent &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#destroyingclarifying-voter-intent&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;As discussed in previous posts, one of the challenges with any
kind of hand-marked ballot is dealing with edge cases where
the markings are not clear and you have to discern voter
intent. Arguments about how to interpret (or discard) these
ambiguous ballots have been important in at least two very high
stakes US elections, the 2000 Bush/Gore Florida Presidential contest (conducted
on punch card machines) and the 2008 Coleman/Franken Minnesota Senate
contest (conducted on optical scan machines). It&#39;s traditional
at this point to show the following picture of one of the &amp;quot;scrutineers&amp;quot;
from the Florida recount trying to interpret a punch card ballot&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.guim.co.uk/img/media/87ecf8c99247901a67f7b8c690603ec0dbd05575/0_60_1298_1612/master/1298.jpg?width=700&amp;amp;quality=85&amp;amp;auto=format&amp;amp;fit=max&amp;amp;s=98b8af605d4b60e4410f3e1459af16f1&quot; alt=&quot;scrutineer&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In a DRE system, by contrast, all of the interpretation of
voter intent is done by the computer, with the expectation that
any misinterpretation will be caught by the voter checking
the DRE&#39;s work (typically at some summary screen before casting).
In addition, the DRE can warn users about potential errors
on their part (or just make them impossible by forbidding
voters from voting for &amp;gt;1 candidate, etc.).
To the extent to which voters actually check that the DRE is behaving
correctly, this seems like an advantage, but if they do
not (see below) then it&#39;s just destroying information
which might be used to conduct a more accurate election.
For obvious reasons, we have trouble measuring the
error rate of DREs in the field -- again, because the
errors are erased and because observing actual voters while casting ballots is a violation of ballot privacy and secrecy -- but Michael Byrne
&lt;a href=&quot;https://behavioralpolicy.org/wp-content/uploads/2017/08/v3i1-web-Byrne.pdf&quot;&gt;reports&lt;/a&gt;
that under laboratory conditions, DREs have comparable error
rates  (~1-2%) to hand-marked optical scan ballots, so this
suggests that the outcome is about neutral.&lt;/p&gt;
&lt;h1 id=&quot;scalability&quot;&gt;Scalability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#scalability&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;DREs have far worse scaling properties than optical scan systems.  The
number of voters that can vote at once is one of the main limits on
how fast people can get through a polling place.  Thus, you&#39;d like to
have as many voting stations as possible.  However, DREs are expensive
to buy (as well as to set up), so there&#39;s pressure on the maximum
number of machines. To make matters worse, you need more machines than
you would expect by just calculating the total amount of time people
need to vote.&lt;/p&gt;
&lt;p&gt;The intuition here is that people don&#39;t vote evenly throughout the
day, so you need many more machines than you would need to handle the
average arrival rate.  For instance, if you expect to see 1200 voters
over a 12 hour period and each voter takes 6 minutes to vote, you
might think you could get by with 10 machines. However, what actually
happens is that a lot of people vote before work, at lunch, and after
work and so you get a line that builds up early, gradually dissipates
throughout the morning, with a lot of machines standing idle, builds
up again around lunch, then dissipates, and and then another long line
that starts to build up around 5 PM. The math here is complicated,
but roughly speaking you need about &lt;a href=&quot;https://static.usenix.org/events/evt/tech/full_papers/Edelstein.pdf&quot;&gt;twice&lt;/a&gt;
as many machines as you would expect to ensure that lines stay short.
In addition, the problem gets worse when there is high turnout.&lt;/p&gt;
&lt;p&gt;These problems exist to some extent with optical scan, but the
main difference is that the voting stations -- typically a table
and a privacy shield -- are cheap, so you can afford to have
overcapacity. Moreover, if you really start getting backed up
you can let voters fill out ballots on clipboards or whatever.
This isn&#39;t to say that there&#39;s no way to get long lines with
paper ballots; for instance, you could have problems at checkin
or a backup at the precinct count scanner, but in general
paper should be more resilient to high turnout than DREs.
It&#39;s also more resilient to failure: if the scanners fail, you
can just have people cast ballots in a ballot box for later
scanning. If the DREs fail, people can&#39;t vote unless you have
backup paper ballots.&lt;/p&gt;
&lt;h1 id=&quot;security&quot;&gt;Security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#security&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;DREs are computers and as discussed in &lt;a href=&quot;https://blog.mozilla.org/blog/2020/12/14/why-getting-voting-right-is-hard-part-ii-hand-counted-paper-ballots/&quot;&gt;Part
III&lt;/a&gt;,
any kind of computerized voting is dangerous because computers can be
compromised. This is especially dangerous in a DRE system because the
computer completely controls the users experience: it can let the
voter vote for Smith -- and even show the voter that they voted for
Smith -- and then record a vote for Jones. In the most basic DRE
system, this kind of fraud is essentially undetectable: you simply
have to &lt;em&gt;trust&lt;/em&gt; the computer. For obvious reasons, this is not
good. To quote &lt;a href=&quot;https://twitter.com/rlbarnes/&quot;&gt;Richard Barnes&lt;/a&gt;, &#39;for
security people &amp;quot;trust&amp;quot; is a bad word.&#39;&lt;/p&gt;
&lt;h2 id=&quot;how-to-compromise-a-voting-machine&quot;&gt;How to compromise a voting machine &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#how-to-compromise-a-voting-machine&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There are a number of ways in which a voting machine might get
compromised. The simplest is that someone might with physical access
might subvert it (for obvious&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
reasons, you don&#39;t want voting machines
to be networked, let alone connected to the Internet). The bad news is
that -- at least in the past -- a number of studies of DREs have found
it fairly easy to compromise DREs even with momentary access. For
instance, in 2007, Feldman, Halderman, and Felten &lt;a href=&quot;https://jhalderm.com/pub/papers/ts-evt07-init.pdf&quot;&gt;studied&lt;/a&gt;
the Diebold AccuVote-TS and found that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;1. Malicious software running on a single voting machine can steal votes 
with little if any risk of detection. The malicious software can modify
all of the records, audit logs, and counters kept by the voting machine,
so that even careful forensic examination of these records will find
nothing amiss. We have constructed demonstration software that carries
out this vote-stealing attack.

2. Anyone who has physical access to a voting machine, or to a memory
card that will later be inserted into a machine, can install said
malicious software using a simple method that takes as little as
one minute. In practice, poll workers and others often have
unsupervised access to the machines.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To quote myself from Part III: Most of the work here was done in the early 2000s, so
it&#39;s possible that things have improved, but the &lt;a href=&quot;https://www.courthousenews.com/wp-content/uploads/2020/10/ga-voting.pdf&quot;&gt;available
evidence&lt;/a&gt;
suggests otherwise. Moreover, there are limits to how good a job it
seems possible to do here.&lt;/p&gt;
&lt;p&gt;As with precinct-count machines, there are a number of ways in which
an attacker might get enough physical access to the machine in order to attack them.
Anyone who has access to the warehouse where the machines are stored could potentially
tamper with them. In addition it&#39;s not uncommon for voting machines to
be stored overnight at polling places before the election, where
you&#39;re mostly relying on whatever lock the church or school or
whatever has on its doors. It&#39;s also not impossible that a voter
could exploit temporary physical access to a machine in order
to compromise it -- remember that there usually will be a lot of machines
in a given location -- but that is a somewhat harder attack to mount.&lt;/p&gt;
&lt;h2 id=&quot;viral-attacks&quot;&gt;Viral attacks &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#viral-attacks&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;However, there is another more serious attack modality: device
administration. Prior to each election, DREs need to be initialized
with the ballot contents for each context. The details of how
this is done vary, for instance one connect them via a cable
to the &lt;em&gt;Election Management Server&lt;/em&gt; (EMS), or insert a memory
stick programmed by the EMS, or sometimes over a local
network. In either case, this electronic connection
is a potential avenue for attack by an attacker who controls
the EMS. This connection can also be an opportunity for a compromised
voting machine to attack the EMS. Together, these provide the
potential conditions for a virus: an attacker compromises a single
DRE and then uses that to attack the EMS, and then uses the EMS
to attack every DRE in the jurisdiction. This has been demonstrated
on real systems. Here&#39;s Feldman et al. again:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;3. AccuVote-TS machines are susceptible to voting-machine viruses—computer 
viruses that can spread malicious software automatically and invisibly from
machine to machine during normal pre- and post-election activity. We have
constructed a demonstration virus that spreads in this way, installing our
demonstration vote-stealing program on every machine it infects.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&#39;s important to remember that this kind of attack is also potentially
possible with precinct-count opscan machines: any time you have computers
in the polling place you run this risk. The major difference is that
with precinct-count opscan machines, you have the paper ballots available
so you can recount them without trusting the computer.&lt;/p&gt;
&lt;h2 id=&quot;voter-verifiable-paper-audit-trails-(vvpat)&quot;&gt;Voter Verifiable Paper Audit Trails (VVPAT) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#voter-verifiable-paper-audit-trails-(vvpat)&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Because of this kind of concern, some DREs are fitted with what&#39;s
called a &lt;em&gt;Voter Verifiable Paper Audit Trail&lt;/em&gt; (VVPAT). A typical
VVPAT is a reel-to-reel thermal printer (think credit card receipts) behind a clear cover that is
attached to the voting machine, as in the picture of a
Hart voting machine below (the VVPAT is the grey box on the left).
[Picture by Joseph Lorenzo Hall].&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://www.flickr.com/photos/joebeone/50853138912/in/dateposted-public/&quot; alt=&quot;Hart eSlate with VVPAT&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The typical way this works is that after the voter has made
their selections they will be presented with a final confirmation
screen. At the same time, the VVPAT will print out a summary
of their choices which the voter can check. If they are correct,
the voter accepts them. If not, they can go back and correct
their choices, and then go back to the confirmation screen.
The idea is that the VVPAT then becomes an untamperable -- at
least electronically -- record of the voter&#39;s choices and can be
counted separately if there is some concern about the correctness
of the machine tally. If everyone did this, then DREs with VVPAT would
be software independent (recall our discussion of SI in &lt;a href=&quot;https://blog.mozilla.org/blog/2021/01/05/why-getting-voting-right-is-hard-part-iii-optical-scan/&quot;&gt;Part III&lt;/a&gt; of this series).&lt;/p&gt;
&lt;p&gt;The major problem with VVPATs is that voters make mistakes and
they aren&#39;t very good about checking the results.
This means that a compromised machine can change the voter&#39;s vote
(as if the voter had made a mistake). If the voter doesn&#39;t
catch the mistake, then the attacker wins, and if they do, they&#39;re
allowed to correct the mistake.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; The most recent &lt;a href=&quot;https://jhalderm.com/pub/papers/bmd-verifiability-sp20.pdf&quot;&gt;work&lt;/a&gt;
on this comes from Bernhard et al., who studied &lt;em&gt;Ballot Marking Devices&lt;/em&gt; (BMDs), which
are like DREs except that they print out optical scan ballots (see below).
They found that if left to themselves around 6.5% of voters
(in a simulated but realistic setting) will
detect ballots being changed. There is some good news here, which
is that with appropriate warnings by the &amp;quot;poll workers&amp;quot; the
researchers were able to raise the detection rate to 85.7%, though
it&#39;s not clear how feasible it is to get poll workers to give those
warnings.&lt;/p&gt;
&lt;h1 id=&quot;privacy%2Fsecrecy-of-the-ballot&quot;&gt;Privacy/Secrecy of the Ballot &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#privacy%2Fsecrecy-of-the-ballot&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The DRE privacy/secrecy story is also somewhat disappointing. There are
two main ways that the system can leak how a voter voted: via
&lt;em&gt;Cast Vote Records&lt;/em&gt; (CVRs) and via the VVPAT paper record. A CVR is just an
electronic representation of a given voter&#39;s ballot stored on the DRE&#39;s &amp;quot;disk&amp;quot;. In principle,
you might think that you could just store the totals for each
contest, but it&#39;s convenient to have CVRs around for a variety
of reasons, including post-election analysis (looking for undervotes,
possible tabulation errors, etc.) In any case, it&#39;s common practice
to record them and the &lt;a href=&quot;https://www.eac.gov/voting-equipment/voluntary-voting-system-guidelines&quot;&gt;Voluntary Voting Systems Guidelines (VVSG)&lt;/a&gt;
promulgated by the US Election Assistance Commission encourage vendors to
do so. This isn&#39;t necessarily a problem if CVRs are handled correctly, but
it must be impossible to link a CVR back to a voter. This means
they have to be stored in a random order with no identifying marks
that lead back to voter sequence. Historically, manufacturers have
not always gotten this right, as, for instance, as the
California TTBR
with the &lt;a href=&quot;https://votingsystems.cdn.sos.ca.gov/oversight/ttbr/sequoia-source-public-jul26.pdf&quot;&gt;Sequoia AVC Edge&lt;/a&gt;
and &lt;a href=&quot;https://votingsystems.cdn.sos.ca.gov/oversight/ttbr/sequoia-source-public-jul26.pdf&quot;&gt;Hart eSlate&lt;/a&gt;.
These problems can also exist with precinct count optical scan systems, but
I forgot to mention it in my post on them. Sorry about that.
Even if this part is done correctly, there are risks of pattern
voting attacks in which the voter casts their ballot in a specific
unique way, though again this can happen with optical scan.&lt;/p&gt;
&lt;p&gt;The VVPAT also presents a problem. As described above, VVPATs
are typically one long strip of paper, with the result that
the VVPAT reflects the order in which votes were cast. An
attacker who can observe the order in which voters voted
and who also has access to the VVPAT can easily determine
how each voter voted. This issue can be mostly mitigated
with election procedures which cut the VVPAT roll apart prior to
usage, but absent those procedures it represents a risk.&lt;/p&gt;
&lt;h1 id=&quot;ballot-marking-devices&quot;&gt;Ballot Marking Devices &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#ballot-marking-devices&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The final thing I want to cover in this post is what&#39;s called a
&lt;em&gt;Ballot Marking Device&lt;/em&gt; (BMD) [also known as an &lt;em&gt;Electronic Ballot Marker&lt;/em&gt; (EBM)].
BMDs have gained popularity in recent years -- especially with
people from the computer science voting security community -- as
a design that tries to blend some of the good parts of DREs with some
of the good parts of paper ballots. For example, the &lt;a href=&quot;https://voting.works/&quot;&gt;Voting Works&lt;/a&gt;
open source machine design is an BMD, as is Los Angeles&#39;s new &lt;a href=&quot;https://vsap.lavote.net/&quot;&gt;VSAP&lt;/a&gt; machine.&lt;/p&gt;
&lt;p&gt;A BMD is conceptually similar to DRE but with two important differences:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It doesn&#39;t have a VVPAT but instead prints out an ballot which can
be fed into an optical scanner.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because the actual ballot counting is done by the scanner, you
don&#39;t need the machine to count votes, so it doesn&#39;t need
to store CVRs or maintain vote totals.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;BMDs address the privacy issues with DREs fairly effectively:
you don&#39;t need to worry about the CVRs in the machine and the
ballots are already randomized. They also partly address the
scaling issues: while BMDs aren&#39;t any cheaper, if a long line
develops you can fall back to hand-marked optical scan ballots
without disrupting any of your back-end processes.&lt;/p&gt;
&lt;p&gt;It&#39;s less clear that they address the security issues: a compromised
BMD can cheat just as much as a compromised DRE and so they still
rely on the voter checking their ballot. There have been some
somewhat tricky attacks proposed on DREs where the attacker controls
the printer in a way that fools the user about the VVPAT record
and these can&#39;t be mounted with a BMD, but it&#39;s not clear how practical
those attacks are in any case. Probably the biggest security advantage of
a BMD is that you don&#39;t need to worry about trusting the machine
count or the communications channel back from the machine: you just
count the opscan ballots without having to mess around with the
VVPAT.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h1 id=&quot;up-next%3A-post-election-audits&quot;&gt;Up Next: Post-Election Audits &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-dre/#up-next%3A-post-election-audits&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;We&#39;ve now covered all the major methods used for casting and counting
votes. That&#39;s just the beginning, though: if you want to have confidence
in an election you need to be able to audit the results. That&#39;s a topic
that deserves its own post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For instance, Santa Clara county produces
ballots in English, Chinese, Spanish, Tagalog, and Vietnamese,
Hindi, Japanese, Khmer, and Korean. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
Punch cards are an &lt;a href=&quot;https://verifiedvoting.org/election-system/ess-votomatic/&quot;&gt;old system&lt;/a&gt; with some interesting properties.
The voter marks their ballot by punching holes in a punch card.
The card itself has no candidates written on it but is instead
inserted into a holder that lists the contests and choices.
The card itself is then read by a standard &lt;a href=&quot;https://en.wikipedia.org/wiki/Punched_card_input/output&quot;&gt;punch card reader&lt;/a&gt;.
This seems like it ought to be fairly straightforward but went
wrong in a number of ways in Florida due to a combination
of poor ballot design and an unfortunate technical failure
mode: it was possible to punch the cards incompletely
and as the voting machine filled up with &lt;em&gt;chads&lt;/em&gt; (the little
pieces of paper that you punched out), it would sometimes
become harder to punch the ballot completely. This resulted in
a number of ballots which had partially detached (&amp;quot;hanging&amp;quot;) chads
or just dimpled chads, leading to debates about how to interpret them.
Wikipedia has a pretty good &lt;a href=&quot;https://en.wikipedia.org/wiki/Florida_election_recount&quot;&gt;description&lt;/a&gt;
of what happened here. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;At least they should be obvious:
It&#39;s incredibly hard to write software that can resist
compromise by a dedicated attacker who has direct access (this
is why you have to keep upgrading your browser and operating
system to fix security issues). Given the critical nature
of voting machines, you really don&#39;t want them attached to the
Internet. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
In principle, this might leave statistical artifacts, such as
a higher rate of correcting from Smith -&amp;gt; Jones than Jones -&amp;gt; Smith,
but it would take a fair amount of work to be sure that this wasn&#39;t
just random error. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;
We&#39;ve touched on this a few times, but one of the real advantages
of paper ballots is that they serve as a single common format
for votes. Once you have that format, it&#39;s possible to have
multiple methods for writing (by hand, BMD) and reading
(by hand, central count opscan, precinct count opscan) the
ballots. That gives you increased flexibility because it means
that you can innovate in one area without affecting others,
as well as allowing either the writing side (voters)
or reading side (election officials) to change its processes
without affecting the other. This is a principle with applicability
far beyond voting. Interoperable standardized
data formats and protocols are a basic foundation of the Internet
and the Web and much of what has made the rapid advancement of
the Internet possible. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-dre/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Why getting voting right is hard, Part IV: Absentee Voting and Vote By Mail</title>
		<link href="https://educatedguesswork.org/posts/voting-vbm/"/>
		<updated>2021-01-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/voting-vbm/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is the fourth post in my series on voting systems.
&lt;a href=&quot;https://blog.mozilla.org/blog/2020/12/08/why-getting-voting-right-is-hard-part-i-introduction-and-requirements/&quot;&gt;Part I&lt;/a&gt; covered requirements and then
&lt;a href=&quot;https://blog.mozilla.org/blog/2020/12/14/why-getting-voting-right-is-hard-part-ii-hand-counted-paper-ballots/&quot;&gt;Part II&lt;/a&gt; and
&lt;a href=&quot;https://blog.mozilla.org/blog/2021/01/05/why-getting-voting-right-is-hard-part-iii-optical-scan/&quot;&gt;Part III&lt;/a&gt; covered in-person voting using paper ballots.
However, paper ballots don&#39;t need to be voted in person; it&#39;s also
possible to have people mail in their ballots, in which case they can
be counted the same way as if they had been voted in person.&lt;/p&gt;
&lt;p&gt;Mail-in ballots get used in two main ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Absentee Ballots&lt;/em&gt;: inevitably, some voters will be unavailable
on election day. Even with early voting, some voters
(e.g., students, people living overseas, members of the military,
people on travel, etc.) might be out of town for weeks or months. In many
cases, some or all these voters are still eligible to vote in
the jurisdiction in which they are nominally residents even if they aren&#39;t
physically present. The usual procedure is to mail them a ballot
and let them mail it back in.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Vote By mail&lt;/em&gt; (VBM): some jurisdictions (e.g., Oregon) have
abandoned in-person voting entirely and mail every registered voter
a ballot and have them mail it back.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a technical perspective, absentee ballots and vote-by-mail work
the same way; it&#39;s just a matter of which sets of voters vote in
person and which don&#39;t. These lines also blur some in that some
jurisdictions require a reason to vote absentee whereas some just
allow anyone to request an absentee ballot (&amp;quot;no-excuse absentee&amp;quot;).  Of
course, in a vote-by-mail only jurisdiction then voters don&#39;t need to
take any action to get mailed a ballot. For convenience, I&#39;ll mostly
be referring to all of these procedures as mail-in ballots.&lt;/p&gt;
&lt;p&gt;As mentioned above, counting mail-in ballots is the same as counting
in-person ballots. In fact, in many cases jurisdictions will use the
same ballots in each case, so they can just hand count them or run
them through the same optical scanner as they would with in-person
voted ballots, which simplifies logistics considerably. The major
difference between in-person and mail-in voting is the need for
different mechanisms to ensure that only authorized voters vote (and
that they only vote once). In an in-person system, this is ensured by
determining eligibility when voters enter the polling place and then
giving each voter a single ballot, but this obviously doesn&#39;t work in
the case of mailed-in ballots -- it&#39;s way too easy for an attacker
to make a pile of fake ballots and just mail them in -- so something else is needed.&lt;/p&gt;
&lt;h1 id=&quot;authenticating-ballots&quot;&gt;Authenticating Ballots &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-vbm/#authenticating-ballots&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;As with in-person voting, the basic idea behind securing mail-in
ballots is to tie each ballot to a specific registered voter
and ensure that every voter votes once.&lt;/p&gt;
&lt;p&gt;If we didn&#39;t care about the secrecy of the ballot, the easy solution
would be to give every voter a unique identifier
(Operationally, it&#39;s somewhat easier to instead
give each ballot a unique serial number and then keep a record of
which serial numbers correspond to each voter, but these are
largely equivalent.) Then when the
ballots come in, we check that (1) the voter exists and (2) the voter
hasn&#39;t voted already.  When put together,
these checks make it very difficult for an attacker to make their own
ballots: if they use non-existent serial numbers, then the ballots
will be rejected, and if they use serial numbers that correspond to
some other voter&#39;s ballot then they risk being caught if that voter
voted.  So, from a security perspective, this works reasonably well,
but it&#39;s a privacy disaster because it permanently associates a
voter&#39;s identity with the contents of their ballots: anyone who has
access to the serial number database and the ballots can determine how
individual voters voted.&lt;/p&gt;
&lt;p&gt;The solution turns out to be to authenticate the &lt;em&gt;envelopes&lt;/em&gt; not the
ballots. The way that this works is that each voter is sent a
non-unique ballot (i.e., one without a serial number) and then an
envelope with a unique serial number. The voter marks their ballot,
puts it in the envelope and mails it back. Back at election headquarters,
election officials perform the two checks described above. If they
fail, then the envelope is sent aside for further processing. If they
succeed, then the envelope is emptied -- checking that it only
contains one ballot -- and put into the pile for
counting.&lt;/p&gt;
&lt;p&gt;This procedure provides some level of privacy protection: there&#39;s
no single piece of paper that has both the voter&#39;s identity and
their vote, which is good, but at the time when election officials
open the ballot they can see both the voter&#39;s identity and the
ballot, which is bad. With some procedural safeguards it&#39;s hard to
mount a large scale privacy violation: you&#39;re going to be opening
a lot of ballots very quickly and so keeping track of a lot of
people is impractical, but an official could, for instance,
notice a particular person&#39;s name and see how they voted.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; Some
jurisdictions address this with a two envelope system: the voter
marks their ballot and puts it in an unmarked &amp;quot;secrecy envelope&amp;quot;
which then goes into the marked envelope that has their identity
on it. At election headquarters officials check the outer envelope,
then open it and put the sealed secrecy envelope in the pile for
counting. Later, all of the secrecy envelopes are opened and counted;
this procedure breaks the connection between the user&#39;s identity
and their ballot.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h1 id=&quot;signature-matching&quot;&gt;Signature Matching &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-vbm/#signature-matching&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The basic idea behind the system described above is to match
ballots mailed out (which are tied to voter registration) to
ballots mailed in. This works as long as there&#39;s no opportunity
for attackers to substitute their own ballots for those of a
legitimate voter. There are a number of ways that might happen,
including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Stealing the ballot in the mail, either on the way out to the voter
or when it is sent back to election headquarters. Stealing the ballot
on the way back works a lot better because if voters don&#39;t
receive their ballots they might ask for another one, in
which case you have duplicates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Inserting fake ballots for people who you don&#39;t expect to
vote. This is obviously somewhat risky, as they might decide
to vote and then you would have a duplicate, but many people
vote infrequently and therefore have a reduced risk of
creating a duplicate ballot.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Again, I&#39;m assuming that the attacker can make their own
ballots and envelopes. This isn&#39;t trivial, but neither is it
impossible, especially for a state-level actor.&lt;/p&gt;
&lt;p&gt;Some jurisdictions attempt to address this form of attack by requiring
voters to sign their ballot envelopes. Those envelopes can then be
compared to the voter&#39;s known signature (for instance on their voter
registration card).  Some jurisdictions even require a witness to sign the ballot too -- affirming the
identity of the person signing the ballot, to include a copy of their
ID, or even to have the ballot envelope notarized.
The requirements vary radically between jurisdictions (see
&lt;a href=&quot;https://www.ncsl.org/research/elections-and-campaigns/vopp-table-14-how-states-verify-voted-absentee.aspx&quot;&gt;here&lt;/a&gt;
for a table of how this works in each state). To the best of my
knowledge, there&#39;s no real evidence that this kind of signature
validation provides significantly more defense against fraud.
From an analytic perspective, the level of protection depends on the
capabilities of an attacker and the detection methods used by
election officials. For instance, an attacker who steals your
ballot on the way back could potentially try to duplicate your
signature (after all, it&#39;s on the envelope!), which seems reasonably
likely to work, but an attacker who is just trying to impersonate
people who didn&#39;t vote might have some trouble because they wouldn&#39;t
know what your signature looked like.&lt;/p&gt;
&lt;h1 id=&quot;ballots-with-errors&quot;&gt;Ballots with Errors &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-vbm/#ballots-with-errors&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;It&#39;s not uncommon for the returned ballots to have some kind
of error, for instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Voter used their own envelope instead of the official envelope&lt;/li&gt;
&lt;li&gt;Voter didn&#39;t use the secrecy envelope&lt;/li&gt;
&lt;li&gt;Voter didn&#39;t sign the envelope&lt;/li&gt;
&lt;li&gt;Voter signature doesn&#39;t match&lt;/li&gt;
&lt;li&gt;Envelope not notarized.&lt;/li&gt;
&lt;li&gt;Overvotes&lt;/li&gt;
&lt;li&gt;Damaged ballots (torn ballots, ballots with stains, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these can potentially lead to a voter&#39;s ballot being
rejected. Moreover, the more requirements a voter&#39;s ballot
has to meet, the greater chance that it will be rejected, so
there is a need to balance the additional security and
privacy provided by extra requirements against the additional
risk of rejecting ballots which are actually legitimate, but just
nonconformant. Different jurisdictions have made different
tradeoffs here.&lt;/p&gt;
&lt;p&gt;Just because a ballot has a problem doesn&#39;t mean that
the voter is necessarily out of luck: some jurisdictions have
what&#39;s called a &lt;a href=&quot;https://www.lwv.org/blog/when-it-comes-absentee-and-mail-voting-what-notice-cure-process&quot;&gt;cure&lt;/a&gt; process in which the election
officials reach out to the voter whose name is on the
ballot and offer them an
opportunity to fix their ballot, with the fix depending on
the jurisdiction and the precise problem. Some jurisdictions
just discard the ballot, for example in the case of &lt;a href=&quot;https://www.lawfareblog.com/secrecy-sleeves-and-naked-ballot&quot;&gt;&amp;quot;naked ballots&amp;quot;&lt;/a&gt;
-- ballots where voters did not use the inner secrecy envelope.&lt;/p&gt;
&lt;p&gt;Of course, not all problems can be cured. In particular, once
the ballot has been disassociated from the envelope, then
there&#39;s no way to go back to the voter and get them to fix
an error such as an overvote. This issue isn&#39;t unique to
vote-by-mail, however: it also occurs with voting systems using central-count
optical scanners (&lt;a href=&quot;https://blog.mozilla.org/blog/2021/01/05/why-getting-voting-right-is-hard-part-iii-optical-scan/&quot;&gt;see Part III&lt;/a&gt;). In general, if the ballots are anonymized before
processing, then it&#39;s not really possible to fix any
errors in them; you just need to process them the best you
can.&lt;/p&gt;
&lt;p&gt;Ballot rejection is an opportunity for some
level of insider attack: although voting officials do not know
how individuals voted, they might be able to know which voters
are likely to vote a certain way, perhaps by looking at their
address or party affiliation (this is easier if the voter&#39;s
name is on the ballot, not just a serial number) and more strictly
enforce whatever security checks are required for ballots they
think will go the wrong way. Having external observers who
are able to ensure uniform standards can significantly reduce the
risk here.&lt;/p&gt;
&lt;h1 id=&quot;voting-twice&quot;&gt;Voting Twice &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-vbm/#voting-twice&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;There are a number of situations in which multiple ballots might have
been or will be cast for the same voter. A number of these are
legitimate, such as a voter changing their mind after they voted by
mail and deciding to vote in person -- perhaps because they changed
their mind about candidates or because they are worried their absentee
ballot will not be processed in time -- but of course they could also
be the result of error or fraud. There are two basic ways in which double
voting shows up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Two mail-in ballots&lt;/li&gt;
&lt;li&gt;One mail-in ballot and one in-person ballot&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the case of two mail-in ballots, it&#39;s most likely that the first
ballot has already been taken out of the envelope, so there&#39;s
no real way not to count it. All you can do is not count the
second ballot. Note that this means that if an attacker
manages to successfully submit a ballot for you &lt;em&gt;and&lt;/em&gt; gets it in
before you, then their vote will count and yours will not.
Fortunately, this kind of fraud is rare and detectable and once
detected can be investigated. I&#39;m not aware of any election where fake mail-in ballots
have materially impacted the results.&lt;/p&gt;
&lt;p&gt;The more complicated case is when a voter has had a mail-in ballot
sent to them but then decides to vote in person, which can happen for
a number of reasons. For instance, the ballot might have been lost in
the mail (in either direction). This situation is different because
we need to prevent double voting but poll workers don&#39;t know whether the
voter &lt;em&gt;also&lt;/em&gt; submitted their ballot by mail. If the voter is allowed
to vote as usual, you might have a situation in which case the
mail-in ballot had already been processed (at least as far as removing
it from the envelope) and there was no way to remove either ballot,
because they&#39;re both unidentified ballots mixed with other
ballots. Instead, the standard process is to require the voter to
fill in what&#39;s called a &lt;em&gt;provisional&lt;/em&gt; ballot, which is physically
like a mail-in ballot except that it has a statement about what
happened. Provisional ballots are segregated from regular ballots,
so once the rest of the ballots have been processed you can go through
the provisionals and process those for voters whose ordinary mail-in
ballots have not been received/counted.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h1 id=&quot;returned-ballot-theft&quot;&gt;Returned Ballot Theft &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-vbm/#returned-ballot-theft&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Another new source of attack on mail-in ballots -- as well as ballot
drop-boxes -- is theft of the ballots en route to election headquarters.
In-person voting has a number of accounting mechanisms designed to ensure
that the number of voters matches the number of
cast ballots which then matches the number of recorded votes, but
these don&#39;t work for mail-in ballots because many people who
are sent ballots will fail to return them.
In many jurisdictions, voters are able to track their ballots
and see if they have been processed, and could cast them
in person if they are lost. However, as a practical matter,
many voters will not do this. The major defense against this
kind of attack is good processes around mail deliver and
drop-box security as well as post-hoc investigation of
reports of missing ballots.&lt;/p&gt;
&lt;h1 id=&quot;secrecy-of-the-ballot&quot;&gt;Secrecy of the Ballot &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-vbm/#secrecy-of-the-ballot&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;With proper processes at election headquarters, the ballot secrecy
properties of mail-in ballots are comparable to in person voting,
with one major exception: with mail-in ballots it is much easier
for a voter to demonstrate to a third party how they voted. All they
have to do is give the ballot to that third party and let them
fill it out and mail it (perhaps signing the envelope first).
This allows for vote buying/coercion type attacks. This isn&#39;t
ideal, but it&#39;s a difficult attack to mount at a large scale
because the attacker needs to physically engage with each voter.&lt;/p&gt;
&lt;h1 id=&quot;the-cost-of-security&quot;&gt;The cost of security &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-vbm/#the-cost-of-security&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;As noted above, many states have fairly extensive verification
mechanisms for mail-in ballots. These mechanisms are not free, either
to voters or election officials. In particular, requirements such as
notarization increase the cost of voting and thus may deter some
voters from voting. Even apparently lightweight requirements such as
signature matching have the potential to cause valid ballots to be
rejected: some people will forget to sign their name and people do not
sign their name the same way every time and election officials are not
experts on handwriting, so we should expect that they will reject some
number of valid ballots. Cottrell, Herron and Smith report about
&lt;a href=&quot;http://www.dartmouth.edu/~herron/VBM_experience.pdf&quot;&gt;1%&lt;/a&gt;
of ballots being rejected for some kind of signature issue;
with Black and Hispanic voters seemingly having higher rates
of rejection than White voters.
Because real fraud is rare and errors are common, the vast majority of
rejected ballots will actually be legitimate.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;There is a more general point here:
although mail-in ballots &lt;em&gt;seem&lt;/em&gt;
insecure (and this has been a point of concern in the voting security
community) real studies of mail-in ballots show that they have
&lt;a href=&quot;https://www.oregonlegislature.gov/lfo/Documents/2020%20Issue%20Review%20-%20Oregon%20Vote%20by%20Mail.pdf&quot;&gt;extremely low fraud
rates&lt;/a&gt;.
This means that policy makers have to weigh potential security issues with
mail-in voting against their impact on legitimate voters. The
current evidence suggests that mail-in voting modestly increases
voting rates (experience from Oregon suggest by about 2-5
percentage points).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; The implication is that making mail-in voting
more difficult -- whether by restricting it or by adding
hard-to-follow security requirements -- is likely to decrease
the number of accepted ballots while only having a small impact
on voting fraud.&lt;/p&gt;
&lt;h1 id=&quot;up-next%3A-direct-recording-electronic-systems-and-ballot-marking-devices&quot;&gt;Up Next: Direct Recording Electronic systems and Ballot Marking Devices &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-vbm/#up-next%3A-direct-recording-electronic-systems-and-ballot-marking-devices&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;OK. Three posts on paper ballots seems like enough for now, so it&#39;s time to turn to
more computerized voting methods. The other major form of voting in the United States uses
what&#39;s called the &amp;quot;Direct Recording Electronic&amp;quot; (DRE) voting system which just means
that you vote directly on a computer which internally keeps track
of the votes. DRE machines are very popular but have been the
focus of a lot of concern from a security perspective. We&#39;ll be
covering them next, along with a similar seeming but much better
system called a &amp;quot;Ballot Marking Device&amp;quot; (BMD). BMDs are like DREs
but they print out paper ballots that can then be counted either by
hand or with optical scanners.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;in this version, the ballots can just have numbers and not
names, but as we&#39;ll see below, many jurisdictions require names. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;People familiar with computer privacy will recognize
this technique from technologies such as proxies, VPNs, or mixnets. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Provisional ballots are also used for a number of other exception
cases such as voters who go to the wrong polling place (here again, it&#39;s
hard to tell if they tried to vote at multiple polling places) or voters
who claim to be registered but can&#39;t be found on the voters list (this actually
looks the same to precinct-level officials
because each precinct usually just has their own list of voters). &lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This dynamic is quite common when adding new security checks:
any check you add will generally have false positives. In environments
where most behavior is innocent, that means that most of the
behavior you catch will also be innocent people
Bruce Schneier has &lt;a href=&quot;https://www.schneier.com/tag/false-positives/&quot;&gt;written extensively&lt;/a&gt;
about this point. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;While mail-in voting &lt;em&gt;generally&lt;/em&gt; seems to increase turnout by
reducing barriers to voting, there are a number of populations
that find mail-in ballots difficult. One obvious example is
the disabled, who may find filling in paper ballots difficult.
Less well-known is that Native Americans experience &lt;a href=&quot;https://www.narf.org/vote-by-mail/&quot;&gt;special
challenges&lt;/a&gt; that make
exclusive vote-by-mail difficult. Thanks to &lt;a href=&quot;https://josephhall.org/&quot;&gt;Joseph Lorenzo Hall&lt;/a&gt;
for informing me on this point. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-vbm/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Why getting voting right is hard, Part III: Optical Scan</title>
		<link href="https://educatedguesswork.org/posts/voting-opscan/"/>
		<updated>2021-01-05T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/voting-opscan/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is the third post in my series on voting systems.
For background see &lt;a href=&quot;https://blog.mozilla.org/blog/2020/12/08/why-getting-voting-right-is-hard-part-i-introduction-and-requirements/&quot;&gt;part I&lt;/a&gt;.
As described in &lt;a href=&quot;https://blog.mozilla.org/blog/2020/12/14/why-getting-voting-right-is-hard-part-ii-hand-counted-paper-ballots/&quot;&gt;part II&lt;/a&gt; hand-counted paper ballots.have a number of attractive security and privacy properties but scale badly to large elections.
Fortunately, we can count paper ballots efficiently using
&lt;a href=&quot;https://en.wikipedia.org/wiki/Optical_scan_voting_system&quot;&gt;optical scanners (opscan)&lt;/a&gt;. This will be familiar to anyone who has taken
paper-based standardized tests: instead of just checking a box,
next to each choice there is a region (typically an oval) to fill in,
as shown in the examples below
These ballots can then be machine read using an optical scanner
which reports the result totals.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://verifiedvoting.org/wp-content/uploads/2020/08/ESS-DS200-Step2.jpg&quot; alt=&quot;optical scan example&quot; /&gt;
&lt;img src=&quot;https://verifiedvoting.org/wp-content/uploads/2020/08/insight_voting_instructions1-300x131-1.jpg&quot; alt=&quot;optical scan example2&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Optical scan systems come in two basic flavors: &amp;quot;precinct count&amp;quot; and
&amp;quot;central count&amp;quot;. In a precinct count system, the optical scanner is
located at the precinct (or polling place) and the voters can feed
their ballots directly into it. Sometimes the scanner will
be mounted on a ballot box which catches the ballots after
they are scanned.
When the polls close, the scanner produces a total count,
typically recorded on a memory card, printed on a paper receipt, or
both. These can be sent back to election headquarters, together with the
ballots, where the are be aggregated.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://verifiedvoting.org/wp-content/uploads/2020/08/eScan_no_ballot-1-300x225.jpg&quot; alt=&quot;Hart eScan&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In a central count system, the optical scanner is located at election headquarters. These scanners are typically quite a bit larger and faster. Ballots are
collected at the precinct and then sent back there for counting. Some
scanners are self-contained units that do all the tabulating and
some just connect to software on a commodity computer which does
a lot of the work, but of course this is all invisible to the
voter. It&#39;s of course possible to have scanners at both the precinct and election
central -- this could help detect tampering with the ballots in
transit -- but I&#39;m not aware of any jurisdiction which does that.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://verifiedvoting.org/wp-content/uploads/2020/08/Election-Systems-Software-M650.jpg&quot; alt=&quot;ES&amp;amp;S central count scanner&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Because optical scan ballots are just paper ballots counted
via a different method, the voter experience is basically
the same, both in good ways (secrecy of the ballot, easy
scaling at the polling place) and in bad ways (accessibility).
In fact, in case of equipment breakdown or concerns about
fraud you can just hand count the ballots without
negatively impacting the voter experience (or in fact without
voters noticing). The two important
ways in which optical scanning differs from hand counting
is (1) it&#39;s much faster (2) it&#39;s less verifiable.&lt;/p&gt;
&lt;h1 id=&quot;speed-and-scalability&quot;&gt;Speed and Scalability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-opscan/#speed-and-scalability&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The big advantage of optical scanning is that it&#39;s more efficient
than hand counting. A hand counting team can process on the
order of &lt;a href=&quot;http://chil.rice.edu/research/pdf/GogginByrneG_12.pdf&quot;&gt;6-15 contests per minute&lt;/a&gt;. This is much slower than even the slowest optical scanners:
To pick a vendor whose technical specs were easy to find,
ES&amp;amp;S sells central count scanners
that count ballots from 72 to 300 double sided ballots per minute,
depending on the model. This is
quite an improvement over hand counting when we consider that each
ballot will likely have several contests. As an example, the first sheet of a
recent Santa Clara &lt;a href=&quot;https://eservices.sccgov.org/rov/docs/voterguide/127/SC-078-ENG-508.pdf&quot;&gt;sample
ballot&lt;/a&gt;
has 3 contests on one side and 4 on the other, so we&#39;re talking
about being able to count about 2000 contests a minute on the
high end.&lt;/p&gt;
&lt;p&gt;Precinct count scanners typically aren&#39;t particularly fast; they&#39;re
comparable to typical consumer-grade scanning hardware and just
need to be fast enough that they mostly keep up with the rate
at which voters fill in their ballots.
Even low-end desktop scanners can scan &lt;a href=&quot;https://store.hp.com/us/en/pdp/hp-scanjet-pro-2000-s2-sheet-feed-scanner&quot;&gt;10s of pages a minute&lt;/a&gt;, so it&#39;s not
generally a problem to have one or two scanners handling
even a modest sized precinct, given that it typically
takes voters more than a minute to fill in their ballot
and that you can&#39;t check-in more than a few voters a minute.
Additionally, because voters scan their ballots as they vote,
you get results as soon as the polls close without having
to have extra staff to count the ballots; the poll workers
just need to supervise the scanning process (as well as
the rest of the tasks they would have to do with
hand-counted ballots such as maintain custody of the
materials, check-in voters, etc.).&lt;/p&gt;
&lt;p&gt;Optical scanning is also a lot cheaper. In the Washington
recount studied by &lt;a href=&quot;https://www.pewtrusts.org/~/media/legacy/uploadedfiles/pcs_assets/2010/recountbrief1pdf.pdf&quot;&gt;Pew&lt;/a&gt; of optical scanning
was $290,000 as opposed to $900,000 for the hand count. This
is actually an underestimate of the advantage of optical
scanning because, as noted above, that was just the cost
to hand count a single contest, whereas the scanning process
counts multiple contests at once.&lt;/p&gt;
&lt;h1 id=&quot;security-and-verifiability&quot;&gt;Security and Verifiability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-opscan/#security-and-verifiability&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Optical scanning introduces a new security threat: the scanner is a
computer and computers can be compromised. If compromised, the computer can
produce any answer the attacker wants, which is obviously an
undesirable property, but one we take the risk of whenever we put
computers in the critical path of the voting process. This isn&#39;t
just a theoretical risk: there have been numerous studies of
the security of voting machines and in general the results are
extremely discouraging: in past studies, if an attacker is able to get physical
access to a machine, they were usually able to compromise the
software.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.
Most of the work here was done in the early 2000s, so it&#39;s
possible that things have improved, but the &lt;a href=&quot;https://www.courthousenews.com/wp-content/uploads/2020/10/ga-voting.pdf&quot;&gt;available evidence&lt;/a&gt; suggests
otherwise. Moreover, there are limits to how good a job it
seems possible to do here, which I hope to get to in a future
post.&lt;/p&gt;
&lt;p&gt;The impact of an attack depends on the machine type.  In the case of
precinct-count machines, this means that voters might be able to
attack the machines in their precinct, and potentially through them
the entire jurisdiction&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;. This is a somewhat difficult attack to
mount because you need unsupervised access to the machine for long
enough to mount the attack. It&#39;s not uncommon for these devices to
have some sort of management port (you need some way to load the
ballot definitions for each election, update the software, etc.)
though how accessible that is to voters depends on the device and how
it&#39;s deployed in practice.&lt;/p&gt;
&lt;p&gt;In the case of central count machines, attack might be limited to
voting officials, but as noted in Part I, it&#39;s important that a voting
system be immune even to this kind of insider attack. Precinct count
machines are susceptible to insider attack too: anyone who has access
to the warehouse where the machines are stored could potentially
tamper with them. In addition it&#39;s not uncommon for voting machines to
be stored overnight at polling places before the election, where
you&#39;re mostly relying on whatever lock the church or school or
whatever has on its doors.&lt;/p&gt;
&lt;p&gt;The general consensus in the voting security community is that
our goal should be what&#39;s called &lt;a href=&quot;https://en.wikipedia.org/wiki/Software_independence&quot;&gt;software independence&lt;/a&gt;. Rivest and Wack describe this as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A voting system is software-independent if an undetected change or error in its software cannot cause an undetectable change or error in an election outcome.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What this means in practice is that if you are going to use optical
scan voting then you need some way to verify that the scanner is
counting the votes correctly. Fortunately, once you&#39;ve scanned
the ballots, you still have them available to you, with the
exception of any which have been folded, spindled or mutilated
by the scanner. This means you can do as much double checking as
you want.&lt;/p&gt;
&lt;p&gt;Naively, of course, you could just recount the ballots by hand. This
often happens in close races, but obviously doing it all the
time would obviate the point of using optical scanners. What&#39;s needed
is some way to check the scanner without counting every ballot by
hand. What&#39;s emerging as the consensus approach here is what&#39;s called
a &lt;a href=&quot;https://georgetownlawtechreview.org/wp-content/uploads/2020/07/4.2-p523-541-Appel-Stark.pdf&quot;&gt;Risk Limiting Audit&lt;/a&gt;. I&#39;ll cover this in more detail later,
but the basic idea is that you randomly sample ballots and hand count
them. You can then use statistics to estimate the chance that the
election was decided incorrectly. You keep counting until you either
(1) have high confidence that the election was counted correctly or
(2) you have counted all the ballots by hand.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;In really close races, you basically have to do a full recount
by hand. The reason for this isn&#39;t so much that the machines
might have been tampered with but that they might have made mistakes.
Even the best optical scanners sometimes mis-scan and it&#39;s
not reasonable to expect them to do a good job with the
kind of &lt;a href=&quot;https://freedom-to-tinker.com/2008/11/21/discerning-voter-intent-minnesota-recount/&quot;&gt;ambiguous ballots&lt;/a&gt; that you see in the wild.
Ideally, of course, the scanner would kick those ballots back for
manual processing, but you don&#39;t want to kick back too many and
so there&#39;s ambiguity about which ballots are ambiguous and so on.
In most elections this stuff doesn&#39;t matter, but in a really close
one it does, and so if you&#39;re working with hand-marked ballots
there eventually comes a point where you need to fall back to
hand counting. The main value of optical scanning is to reduce the
need for routine hand-counting when elections aren&#39;t close,
which is fortunately most of the time.&lt;/p&gt;
&lt;h1 id=&quot;write-ins%2C-scanning-errors%2C-overvotes%2C-and-other-edge-cases&quot;&gt;Write-Ins, Scanning Errors, Overvotes, and Other Edge Cases &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-opscan/#write-ins%2C-scanning-errors%2C-overvotes%2C-and-other-edge-cases&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Of course, unlike humans, optical scanners aren&#39;t very smart
-- and for security reasons, you don&#39;t really want them doing
smart stuff -- so there are a number of situations that they
handle badly.&lt;/p&gt;
&lt;p&gt;For instance, it&#39;s common
to allow &amp;quot;write-in&amp;quot; votes in which the candidate&#39;s name does
not appear on the ballot but instead the voter writes in
a new name. Write-in candidates don&#39;t usually win -- although
&lt;a href=&quot;https://en.wikipedia.org/wiki/Lisa_Murkowski&quot;&gt;Lisa Murkowski&lt;/a&gt;
famously won as a write-in candidate in 2010 -- but you still
need to process their ballots. As shown in the example at the
top, the natural way to handle this is to have a choice
for each contest which has a blank name: the voter fills in the
bubble associated with the space and then writes the name in
the space.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s also common to have ballots which can&#39;t be read for
one reason or another. For instance, the voter might have
used the wrong color pen or not completely marked the bubble.
Voters also sometimes for more than one
candidate in a given election (&amp;quot;overvoting&amp;quot;). The general
way to handle these cases is to have the machine reject
these ballots and set them aside for further processing by
hand.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; If the number of rejected ballots is less than the margin
of victory then you know that it can&#39;t affect the result
and while you do eventually want to process them for complete
results, you don&#39;t need to for purposes of determining the winner.
If there are more rejected ballots than the margin of victory you of course
need to process them immediately, but as rejected ballots
are typically a small fraction of the total this is much
more feasible than a full hand count.&lt;/p&gt;
&lt;p&gt;There are of course some edge cases that optical scanners
aren&#39;t able to even reject reliably. A good example here
is &amp;quot;undervoting&amp;quot; in which a voter doesn&#39;t vote in certain
contests. This could be a sign of marking error or it could
be intentional; it&#39;s actually quite common in for voters in
the US to just vote the presidential contest and then
skip the downballot races. Because this is common, you don&#39;t
really want the scanner rejecting all undervoted ballots.
Instead you keep a tally of the number of undervotes in
a given contest and if it&#39;s large enough to potentially
affect the election you can go back and hand count the
whole election.&lt;/p&gt;
&lt;p&gt;It&#39;s important to understand that a risk limiting audit
ensures that none of these anomalies can affect the election
result, so at some level it doesn&#39;t matter how the scanner
handles them; it&#39;s just a matter of setting the right tradeoff
in terms of efficiency between the automated and manual
counting stages. However, if -- as is far too common -- you
are not doing a risk limiting audit, it&#39;s important to be
fairly conservative about having the scanner note ambiguous
cases rather than arbitrarily deciding them for one candidate
or another.&lt;/p&gt;
&lt;h1 id=&quot;up-next%3A-vote-by-mail&quot;&gt;Up Next: Vote By Mail &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-opscan/#up-next%3A-vote-by-mail&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;So far in this series I&#39;ve talked about paper ballots as if
they are cast at the polling place, but that doesn&#39;t have
to be the case. They can just as easily be sent to voters
who return them by mail. Depending on the situation this is
referred to as &amp;quot;vote by mail&amp;quot; (VBM) or &amp;quot;absentee ballots&amp;quot;.
VBM brings some special challenges which I&#39;ll be covering
in my next post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;See, for instance the &lt;a href=&quot;https://www.sos.ca.gov/elections/ovsta/frequently-requested-information/top-bottom-review&quot;&gt;reports&lt;/a&gt; of the 2007 Californa Top-to-Bottom Review. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;A number of studies have found &amp;quot;viral&amp;quot; attacks in which
you compromised one machine and then used that to attack
the election management systems, which were then used to
infect all the machines in the jurisdiction. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;You might be wondering if this is really the best we can
do. RLAs are the best known method that is totally software
independent, but if you&#39;re willing to rely on your own software
that is independent of the voting machine software, then one
option would be arrange to video-record the ballots during
counting and then use computer vision techniques to independently
do a recount. I collaborated on a &lt;a href=&quot;https://vision.cornell.edu/se3/wp-content/uploads/2014/09/wang_evt2010_0.pdf&quot;&gt;system&lt;/a&gt; to do this about 10 years
back. It worked reasonably well -- and would surely work far
better with modern computer vision echniques -- but never got much interest. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Actually, the whole idea of having pre-printed ballots
is less universal than many Americans think. The
Wikipedia &lt;a href=&quot;https://en.wikipedia.org/wiki/Secret_ballot&quot;&gt;article&lt;/a&gt;
on the so-called Australian Ballot
makes fascinating reading. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;One advantage of precinct-level counting is that you
can detect this kind of error and give the voter
an opportunity to correct it. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-opscan/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Why getting voting right is hard, Part II: Hand-Counted Paper Ballots</title>
		<link href="https://educatedguesswork.org/posts/voting-hcpb/"/>
		<updated>2020-12-14T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/voting-hcpb/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In &lt;a href=&quot;https://blog.mozilla.org/blog/2020/12/08/why-getting-voting-right-is-hard-part-i-introduction-and-requirements/&quot;&gt;Part I&lt;/a&gt; we looked at desirable properties for voting system. In this post, I want to look at the details
of a specific system, hand-counted paper ballots.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://www.gravesham.gov.uk/__data/assets/image/0003/219441/Ballot-Paper-Example.png&quot; alt=&quot;Sample Ballot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Hand-counted paper ballots are probably the simplest voting system in common use (though mostly outside the US). In practice, the process
usually looks something like the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Election officials pre-print paper ballots and distribute them
to polling places. Each paper ballot has a list of contests
and the choices for each contest, and
a box or some other location where the voter can indicate
their choice, as shown above.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Voters arrive at the polling place, identify themselves
to election workers, and are issued a ballot. They mark the
section of the ballot corresponding to their choice.
They cast their ballots by putting them
into a ballot box, which can be as simple as a cardboard
box with a hole in the top for the ballots.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Once the polls close, the election workers collect all the
ballots. If they are to be locally counted, then the
process is as below; if they are to be centrally counted,
they are transported back to election headquarters for counting.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The counting process varies between jurisdictions, but at a high level
the process is simple. The vote counters go through each ballot one at
a time and determine which choice it is for. &lt;a href=&quot;https://www.josephhall.org/&quot;&gt;Joseph Lorenzo Hall&lt;/a&gt;
provides a good description of the procedure for California&#39;s statutory
1% tally &lt;a href=&quot;https://www.usenix.org/legacy/events/evt08/tech/full_papers/hall/hall_html/jhall_evt08_html.html&quot;&gt;here&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In practice, the hand-counting method used by counties in California seems very similar. The typical tally team uses four people consisting of two talliers, one caller and one witness:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;caller&lt;/strong&gt; speaks aloud the choice on the ballot for the race being tallied (e.g., &amp;quot;Yes...Yes...Yes...&amp;quot; or ``Lincoln...Lincoln...Lincoln...&amp;quot;).&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;witness&lt;/strong&gt; observes each ballot to ensure that the spoken vote corresponded to what was on the ballot and also collates ballots in cross-stacks of ten ballots.&lt;/li&gt;
&lt;li&gt;Each &lt;strong&gt;tallier&lt;/strong&gt; records the tally by crossing out numbers on a tally sheet to keep track of the vote tally.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Talliers announce the tally at each multiple of ten (&amp;quot;10&amp;quot;, &amp;quot;20&amp;quot;, etc.) so that they can roll-back the tally if the two talliers get out of sync.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Obviously other techniques are possible, but as long as people are
able to observe, differences in technique are mostly about efficiency
rather than accuracy or transparency. The key requirement here is that
any observer can look at the ballots and see that they are being
recorded as they are cast. Jurisdictions will usually have some
mechanism for challenging the tally of a specific ballot.&lt;/p&gt;
&lt;h1 id=&quot;security-and-verifiability&quot;&gt;Security and Verifiability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#security-and-verifiability&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The major virtue of hand-counted paper ballots is that they
are simple, with security and privacy properties that are
easy for voters to understand and reason about, and for
observers to verify for themselves&lt;/p&gt;
&lt;p&gt;It&#39;s easiest to break the election in two phases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Voting and collecting the ballots&lt;/li&gt;
&lt;li&gt;Counting the collected ballots&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If each of these is done correctly, then we can have high
confidence that the election was correctly decided.&lt;/p&gt;
&lt;h2 id=&quot;voting&quot;&gt;Voting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#voting&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The security properties of the voting process mostly
come down to ballot handling, namely that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Only authorized voters get ballots and only one ballot.
Note that it&#39;s necessary
to ensure this because otherwise it&#39;s very hard to prevent
multiple voting, where an authorized voter puts in
two ballots.&lt;/li&gt;
&lt;li&gt;Only the ballots of authorized voters make it into
the ballot box.&lt;/li&gt;
&lt;li&gt;All the ballots in the ballot box and only the ballots from the
ballot box make it to election headquarters.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first two of these properties are readily observed by observers --
whether independent or partisan. The last property typically relies on
technical controls. For instance, in Santa Clara county ballots are
taken from the ballot box and put into clear tamper-evident bags for
transport to election central, which limits the ability for poll
workers to replace the ballots. When put together all three properties
provide a high degree of confidence that the right ballots are
available to be counted. This isn&#39;t to say that there&#39;s no opportunity
for fraud via sleight-of-hand or voter impersonation (more on this
later) but it&#39;s largely one-at-a-time fraud, affecting a few ballots
at a time, and is hard to perpetrate at scale.&lt;/p&gt;
&lt;h2 id=&quot;counting&quot;&gt;Counting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#counting&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The counting process is even easier to verify: it&#39;s conducted in the
open and so observers have their own chance to see each ballot and
be confident that it has been counted correctly. Obviously, you need
a lot of observers because you need at least one for each counting
team, but given that the number of voters far exceeds the number
of counting teams, it&#39;s not that impractical for a campaign to
come up with enough observers.&lt;/p&gt;
&lt;p&gt;Probably the biggest source of problems with hand-counted paper
ballots is disputes about the meaning of ambiguous ballots. Ideally
voters would mark their ballots according to the instructions, but
it&#39;s quite common for voters to make stray marks, mark more than one
box, fill in the boxes with dots instead of Xs, or even some more
exotic variations, as shown in the examples below.
In each case, it needs to be determined how to handle
the ballot. It&#39;s common to apply an &amp;quot;Intent of the voter&amp;quot; standard,
but this still requires &lt;a href=&quot;https://freedom-to-tinker.com/2008/11/21/discerning-voter-intent-minnesota-recount/&quot;&gt;judgement&lt;/a&gt;.
One extra difficulty here is that at the point where you are
interpreting each ballot, you already know what it looks like,
so naturally this can lead to a fair amount of partisan bickering
about whether to accept each individual ballot, as each side
tries to accept ballots that seem like they are for their preferred candidate and
disqualify ballots that seem like they are for their opponent.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://minnesota.publicradio.org/features/2008/11/19_challenged_ballots/images/noballot.jpg&quot; alt=&quot;double mark&quot; /&gt;&lt;img src=&quot;https://minnesota.publicradio.org/features/2008/11/19_challenged_ballots/images/lizardpeopleb.jpg&quot; alt=&quot;lizard people&quot; /&gt;&lt;/p&gt;
&lt;p&gt;A related issue is whether a given ballot is valid. This
isn&#39;t so much an issue with ballots cast at a polling place,
but for vote-by-mail ballots there can be questions about
signatures on the envelopes, the number of envelopes, etc.
I&#39;ll get to this later when I cover vote by mail in a later
post.&lt;/p&gt;
&lt;h1 id=&quot;privacy%2Fsecrecy-of-the-ballot&quot;&gt;Privacy/Secrecy of the Ballot &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#privacy%2Fsecrecy-of-the-ballot&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The level of privacy provided by paper ballots depends a fair
bit on the precise details of how they are used and handled.
In typical elections, voters will be given some level of privacy
to fill out their ballot, so they don&#39;t have to worry too
much about that stage (though presumably in theory someone
could set up cameras in the polling place). Aside from that,
we primarily need to worry about two classes of attack:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Tracking a given voter&#39;s ballot from checkin to counting.&lt;/li&gt;
&lt;li&gt;Determining how a voter voted from the ballot itself.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Ideally -- at least from the perspective of privacy -- the
ballots are all identical and the ballot box is big enough
that you get some level of shuffling (how much is an open
question), then it&#39;s quite hard to correlate the ballot
a voter was given to when it&#39;s counted, though you might
be able to narrow it down some by looking at which polling
place/box the ballot came in and where it was in the box.
In some jurisdictions,
ballots have serial numbers, which might make this kind
of tracking easier, though only if records of which voter
gets which ballot are kept and available. Apparently the
UK has this kind of system but &lt;a href=&quot;https://en.wikipedia.org/wiki/Secret_ballot#Secrecy_exceptions&quot;&gt;tightly controls the records&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It&#39;s generally not possible to tell from a ballot itself
which voter it belongs to unless the voter cooperates by
making the ballot distinctive in some way. This might happen
because the voter is being paid (or threatened) to cast
their vote a certain way. While some election jurisdictions
prohibit distinguishing marks, as a practical matter it&#39;s
not really possible to prevent voters from making such
marks if they really want to. This is especially true
when the ballots need not be machine readable and so the
voter has the ability to fill in the box somewhat distinctively
(there are a lot of ways to write an X!).
In elections with a lot of contests, as with many places
on the US, it is also possible to use what&#39;s called a &amp;quot;pattern
voting&amp;quot; attack in which you vote one contest the way you
are told and then vote the downballot contests in a
way that uniquely identifies you. This sort of attack
is very hard to prevent, but actually checking that
people voted they way they were told is of course a lot
of work. There are also more exotic attacks such as
&lt;a href=&quot;https://citp.princeton.edu/our-work/paper/&quot;&gt;fingerprinting paper stock&lt;/a&gt;,
but none of these are easy to mount in bulk.&lt;/p&gt;
&lt;h1 id=&quot;accessibility&quot;&gt;Accessibility &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#accessibility&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;One big drawback of hand-marked ballots is that they are not very
accessible, both to people with disabilities and to non-native
speakers. For obvious reasons, if you&#39;re blind or have limited
dexterity it can be hard to fill in the boxes (this is even harder
with optical scan type ballots). Many jurisdictions that
use paper ballots will also have some accommodation for people
with disabilities. Paper ballots work fine in most languages, but
each language must be separately translated and then printed,
and then you need to have extras of each ballot type in
case more people come than you expect, so at the end of the
day the logistics can get quite complicated. By contrast,
electronic voting machines (which I&#39;ll get to later) scale much
better to multiple languages.&lt;/p&gt;
&lt;h1 id=&quot;scalability&quot;&gt;Scalability &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#scalability&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Although hand-counting does a good job of producing accurate and
verifiable counts, it does not scale very well&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;. Estimates of how
expensive it is to count ballots vary quite a bit, but a 2010 Pew
&lt;a href=&quot;https://www.pewtrusts.org/~/media/legacy/uploadedfiles/pcs_assets/2010/recountbrief1pdf.pdf&quot;&gt;study&lt;/a&gt;
of hand recounts in Washington and Minnesota (the 2004 Washington
gubernatorial and 2008 Minnesota US Senate races) put the cost of
recounting a single contest at between $0.15 and $0.60 per ballot.
Of course, as noted above some of the cost here is that of disputing
ambiguous ballots. If the races is not particularly competitive
then these ballots can be set aside and only need to be carefully
adjudicated if they have a chance of changing the result.&lt;/p&gt;
&lt;p&gt;Importantly, the cost of hand-counting goes up with the number of
ballots times the number of contests on the ballot.
In the United States it&#39;s not uncommon to have 20 or more contests per
election. For example, here is a &lt;a href=&quot;https://eservices.sccgov.org/rov/docs/voterguide/127/SC-078-ENG-508.pdf&quot;&gt;sample ballot&lt;/a&gt; from the 2020 general election in Santa Clara
County, CA. This ballot has the following contests&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Type&lt;/th&gt;
&lt;th style=&quot;text-align:left&quot;&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;President&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;US House of Representatives&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;State Assembly&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Superior Court Judge&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;County Board of Education&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;County Board of Supervisors&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Community College District&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;City Mayor&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;City Council (vote for two)&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;State Propositions&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Local ballot measures&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:left&quot;&gt;Total&lt;/td&gt;
&lt;td style=&quot;text-align:left&quot;&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In an election like this, the cost to count could be several dollars per ballot.
Of course, California has an exceptionally large number of contests, but
in general hand-counting represents a significant cost.&lt;/p&gt;
&lt;p&gt;Aside from the financial impact of hand counting ballots, it just takes
a long time. Pew notes that both the Washington and Minnesota recounts
took around seven months to resolve, though again this is partly due
to the small margin of victory. As another example, California
law requires a &amp;quot;1% post-election manual tally&amp;quot; in which 1% of precincts are randomly
selected for hand-counting. Even with such a restricted count,
the tally can take &lt;a href=&quot;https://www.usenix.org/legacy/events/evt08/tech/full_papers/hall/hall_html/jhall_evt08_html.html&quot;&gt;weeks&lt;/a&gt; in a large county such as Los Angeles, suggesting that hand counting all the ballots would be prohibitive in this setting. This isn&#39;t to say that hand counting
can never work, obviously, merely that it&#39;s not a good match
for the US electoral system, which tends to have a lot more
contests than in other countries.&lt;/p&gt;
&lt;h1 id=&quot;up-next%3A-optical-scanning&quot;&gt;Up Next: Optical Scanning &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#up-next%3A-optical-scanning&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The bottom line here is that while hand counting works well in many
jurisdictions it&#39;s not a great fit for a lot of elections in the
United States. So if we can&#39;t count ballots by hand, then what can we
do? The good news is that there are ballot counting mechanisms which
can provide similar assurance and privacy properties to hand counting
but do so much more efficiently, namely optical scan ballots.  I&#39;ll be
covering that in my next post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;By contrast, the marking process is very scalable: if you have
a long line, you can put out more tables, pens, privacy screens, etc. &lt;a href=&quot;https://educatedguesswork.org/posts/voting-hcpb/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Why getting voting right is hard, Part I: Introduction and Requirements</title>
		<link href="https://educatedguesswork.org/posts/voting1/"/>
		<updated>2020-12-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/voting1/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Every two years around this time, the US has an election and the
rest of the world marvels and asks itself one question: &lt;em&gt;Why are American
elections so hard?&lt;/em&gt; I&#39;m not talking about US politics here but
about the voting systems (machines, paper, etc.) that people use
to vote, which are bafflingly complex.
While it&#39;s true that American voting is a
chaotic patchword of different systems scattered across jurisdictions
running efficient secure elections
is a genuinely hard problem. This is often surprising to people who
are used to other systems that demand precise accounting such as
banking/ATMs or large scale databases, but the truth is that
voting is fundamentally different and much harder.&lt;/p&gt;
&lt;p&gt;In this series I&#39;ll be going through a variety of different voting
systems so you can see how this works in practice. This post
provides a brief overview of the basic requirements for voting systems.
We&#39;ll go into more detail about the practical impact of these requirements
as we examine each system.&lt;/p&gt;
&lt;h1 id=&quot;requirements&quot;&gt;Requirements &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting1/#requirements&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;To understand voting systems design, we first need to understand
the requirements to which they are designed. These vary somewhat,
but generally look something like the below.&lt;/p&gt;
&lt;h2 id=&quot;efficient-correct-tabulation&quot;&gt;Efficient Correct Tabulation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting1/#efficient-correct-tabulation&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This requirement is basically trivial: collect the ballots and tally
them up. The winner is the one with the most votes &lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting1/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;. You also
need to do it at scale and within a reasonable period of time otherwise there&#39;s
not much point.&lt;/p&gt;
&lt;h2 id=&quot;verifiable-results&quot;&gt;Verifiable Results &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting1/#verifiable-results&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It&#39;s not enough for the election just to produce the right result, it
must also do so in a verifiable fashion.  As voting researcher &lt;a href=&quot;https://www.cs.rice.edu/~dwallach/&quot;&gt;Dan
Wallach&lt;/a&gt; is fond of saying, the
purpose of elections is to convince the loser that they actually lost,
and that means more than just trusting the election officials to
count the votes correctly. Ideally, everyone in world would
be able to check for themselves that the votes had been correctly
tabulated (this is often called &amp;quot;public verifiability&amp;quot;), but
in real-world systems it usually means that some set of election
observers can personally observe parts of the process and hopefully
be persuaded it was conducted correctly.&lt;/p&gt;
&lt;h2 id=&quot;secrecy-of-the-ballot&quot;&gt;Secrecy of the Ballot &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting1/#secrecy-of-the-ballot&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The next major requirement is what&#39;s called &amp;quot;secrecy of the ballot&amp;quot;, i.e.,
ensuring that others can&#39;t tell how you voted. Without ballot secrecy,
people could be pressured to vote certain ways or face negative
consequences for their votes. Ballot secrecy actually has two
components (1) other people -- &lt;em&gt;including&lt;/em&gt; election officials --
can&#39;t tell how you voted and (2) you can&#39;t prove to other people
how you voted. The first component is needed to prevent wholesale
retaliation and/or rewards and the second is needed to prevent retail
vote buying. The actual level of ballot secrecy provided by systems
varies. For instance, the UK system &lt;a href=&quot;https://en.wikipedia.org/wiki/Secret_ballot&quot;&gt;technically allows&lt;/a&gt;
election officials to match ballots to the voter, but prevents
it with procedural controls and
vote by mail systems generally don&#39;t do a great job of preventing
you from proving how you voted, but in general most voting
systems attempt to provide some level of ballot secrecy.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting1/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;accessibility&quot;&gt;Accessibility &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting1/#accessibility&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Finally, we want voting systems to be &lt;em&gt;accessible&lt;/em&gt;, both in the
specific sense that we want people with disabilities to be able to
vote and in the more general sense that we want it to be generally
easy for people to vote. Because the voting-eligible population
is so large and people&#39;s situations are so varied, this often
means that systems have to make accommodations, for instance
for overseas or military voters or for people who speak different
languages.&lt;/p&gt;
&lt;h1 id=&quot;limited-trust&quot;&gt;Limited Trust &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting1/#limited-trust&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;As you&#39;ve probably noticed, one common theme in these requirements is
the desire to limit the amount of trust you place in any one entity or
person. For instance, when I worked the polls in Santa Clara county
elections, we would collect all the paper ballots and put them in
tamper-evident envelopes before taking them back to election central
for processing. This makes it harder for the person transporting the
ballots to examine the ballots or substitute their own. For those
who aren&#39;t used to the way security people think, this often feels
like saying that election officials aren&#39;t trustworthy, but really
what it&#39;s saying is that elections are very high stakes events
and critical systems like this should be designed
with as few failure points as possible, and that includes preventing
both outsider and insider threats, protecting even against authorized election workers themselves.&lt;/p&gt;
&lt;h1 id=&quot;an-overconstrained-problem&quot;&gt;An Overconstrained Problem &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting1/#an-overconstrained-problem&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Individually each of these requirements is fairly easy to meet, but
the combination of them turns out to be extremely hard. For example
if you publish everyone&#39;s ballots then it&#39;s (relatively) easy to ensure
that the ballots were counted correctly, but you&#39;ve just completely
give up secrecy of the ballot.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/voting1/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; Conversely, if you just trust
election officials to count all the votes, then it&#39;s much easier to
provide secrecy from everyone else. But these properties are both
important, and hard to provide simultaneously. This tension is at the heart
of why voting is so much more difficult than other superficially
systems like banking. After all, your transactions aren&#39;t secret
from the bank. In general, what we find is that voting systems
may not completely meet all the requirements but rather compromise
on trying to do a good job on most/all of them.&lt;/p&gt;
&lt;h1 id=&quot;up-next%3A-hand-counted-paper-ballots&quot;&gt;Up Next: Hand-Counted Paper Ballots &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/voting1/#up-next%3A-hand-counted-paper-ballots&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;In the next post, I&#39;ll be covering what is probably the simplest
common voting system: hand-counted paper ballots. This system
actually isn&#39;t that common in the US for reasons I&#39;ll go into,
but it&#39;s widely used outside the US and provides a good introduction
into some of the problems with running a real election.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For the purpose of this series, we&#39;ll mostly be assuming
&lt;a href=&quot;https://en.wikipedia.org/wiki/First-past-the-post_voting&quot;&gt;first past the post&lt;/a&gt;
systems, which are the main systems in use in the US.] &lt;a href=&quot;https://educatedguesswork.org/posts/voting1/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note that I&#39;m talking here about systems designed for use by ordinary
citizens. Legislative voting, judicial voting, etc. are qualitatively
different: they usually have a much smaller number of voters
and don&#39;t try to preserve the secrecy of the ballot, so the problem
is much simpler. &lt;a href=&quot;https://educatedguesswork.org/posts/voting1/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Thanks to Hovav Shacham for this example. &lt;a href=&quot;https://educatedguesswork.org/posts/voting1/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>A look at password security, Part V: File and Disk Encryption</title>
		<link href="https://educatedguesswork.org/posts/disk-encryption/"/>
		<updated>2020-09-05T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/disk-encryption/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The previous posts (
&lt;a href=&quot;https://blog.mozilla.org/blog/2020/07/08/password-security-part-i/&quot;&gt;I&lt;/a&gt;,
&lt;a href=&quot;https://blog.mozilla.org/blog/2020/07/13/password-security-part-ii/&quot;&gt;II&lt;/a&gt;,
&lt;a href=&quot;https://blog.mozilla.org/blog/2020/07/20/a-look-at-password-security-part-iii-more-secure-login-protocols/&quot;&gt;III&lt;/a&gt;,
&lt;a href=&quot;https://blog.mozilla.org/blog/2020/08/20/password-security-part-iv-webauthn/&quot;&gt;IV&lt;/a&gt;)
focused primarily on remote login, either to multiuser systems or Web
sites (though the same principles also apply to other networked
services like e-mail). However, another common case where users
encounter passwords is for login to devices such as laptops, tablets,
and phones. This post addresses that topic.&lt;/p&gt;
&lt;h1 id=&quot;threat-model&quot;&gt;Threat Model &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/disk-encryption/#threat-model&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;We need to start by talking about the threat model. As a general matter,
the assumption here is that the attacker has some physical access to your
device. While some devices do have password-controlled remote access,
that&#39;s not the focus here.&lt;/p&gt;
&lt;p&gt;Generally, we can think of two kinds of attacker access.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Non-invasive&lt;/em&gt;: The attacker isn&#39;t willing to take the device apart,
perhaps because they only have the device temporarily and don&#39;t want
to leave traces of tampering that would alert you.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Invasive&lt;/em&gt;: The attacker is willing to take the device apart. Within
invasive, there&#39;s a broad range of how invasive the attacker is willing to be,
starting with &amp;quot;open the device and take out the hard drive&amp;quot; and ending
with &amp;quot;strip the packaging off all the chips and examine them with an
electron microscope&amp;quot;.&lt;/p&gt;
&lt;p&gt;How concerned you should be depends on who you are, the value of your
data, and the kinds of attackers you face. If you&#39;re an ordinary person
and your laptop gets stolen out of your car, then attacks are probably
going to be fairly primitive, maybe removing the hard disk but probably
not using an electron microscope. On the other hand, if you have high
value data and the attacker targets you specifically, then you should
assume a fairly high degree of capability. And of course people
in the computer security field routinely worry about attackers with
nation state capabilities.&lt;/p&gt;
&lt;h1 id=&quot;it&#39;s-the-data-that-matters&quot;&gt;It&#39;s the data that matters &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/disk-encryption/#it&#39;s-the-data-that-matters&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;It&#39;s natural to think of passwords as a measure that protects
access to the computer, but in most cases it&#39;s really a matter
of access to the data on your computer. If you make a copy of
someone&#39;s disk and put it in another computer that will be
a pretty close clone of the original (that&#39;s what a backup
is, after all) and the attacker will be able to read all
your sensitive data off the disk.&lt;/p&gt;
&lt;p&gt;This implies two very easy attacks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Bypass the operating system on the computer and access the
disk directly. For instance, on a Mac you can boot
into &lt;a href=&quot;https://support.apple.com/en-us/HT201314&quot;&gt;recovery mode&lt;/a&gt;
and just examine the disk. Many UNIX machines have something
called &lt;a href=&quot;https://en.wikipedia.org/wiki/Single-user_mode&quot;&gt;single-user mode&lt;/a&gt;
which boots up with administrative access.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Remove the disk and mount it in another computer as an external
disk. This is trivial on most desktop computers, requiring only a
screwdriver (if that) and on many laptops as well; if you have a Mac
or a mobile device, the disk may be a soldered in Flash drive, which
makes things harder but still doable.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key thing to realize is that nearly all of the access controls on
the computer are just implemented by the operating system software.
If you can bypass that software by booting into an administrative
mode or by using another computer, then you can get past all of them
and just access the data directly.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/disk-encryption/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;If you&#39;re thinking that this is bad, you&#39;re right. And the solution to this
is to &lt;em&gt;encrypt your disk&lt;/em&gt;. If you don&#39;t do that, then basically
your data will not be secure against any kind of dedicated
attacker who has physical access to your device.&lt;/p&gt;
&lt;h1 id=&quot;password-based-key-derivation&quot;&gt;Password-Based Key Derivation &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/disk-encryption/#password-based-key-derivation&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The good news is that basically all operating systems support disk
encryption. The bad news is that the details of how it&#39;s implemented
vary dramatically in some security critical ways. I&#39;m not talking
here about the specific details about cryptographic algorithms and
how each individual disk block is encrypted. That&#39;s a fascinating
topic (see &lt;a href=&quot;https://blog.cryptographyengineering.com/2016/11/24/android-n-encryption/&quot;&gt;here&lt;/a&gt;), but most operating systems do something
mostly adequate. The most interesting question for users is how
the disk encryption keys are handled and how the the password is
used to gate access to those keys.&lt;/p&gt;
&lt;p&gt;The obvious way to do this -- and the way things  used to work
pretty much everywhere -- is to generate the encryption key directly from the password.
[Technical Note: You probably really want generate a random key and encrypt it with a
key derived from the password. This way you can change your password without re-encrypting
the whole disk. But from a security perspective these are fairly equivalent.]
The technical term for this is a
&lt;a href=&quot;https://en.wikipedia.org/wiki/Key_derivation_function&quot;&gt;password-based key derivation function&lt;/a&gt;,
which just means that it takes a password and outputs a key.
For our purposes, this is the same as a password hashing
function and it has the same problem: given an encrypted disk
I can attempt to brute force the password by trying a large
number of candidate passwords. The result is that you need
to have a super-long password (or often a passphrase) in order
to prevent this kind of attack. While it&#39;s possible to memorize
a long enough password, it&#39;s no fun, as well as being a real pain to
type in whenever you want to log in to your computer, let alone
on your smartphone or tablet. As a result, most people use much
shorter passwords, which of course weakens the security of disk
encryption.&lt;/p&gt;
&lt;h1 id=&quot;hardware-security-modules&quot;&gt;Hardware Security Modules &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/disk-encryption/#hardware-security-modules&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;As we&#39;ve seen before, the problem here is that the attacker gets
to try candidate passwords very fast and the only real fix is
to limit the rate at which they can try. This is what many
modern devices do. Instead of deriving the encryption
key from the password, they generate a random encryption key inside of
a piece of &lt;em&gt;hardware security module&lt;/em&gt; (HSM).&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/disk-encryption/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; What &amp;quot;secure&amp;quot; means varies but
ideally it&#39;s something like:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It can do encryption and decryption internally without ever
exposing the keys.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/disk-encryption/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;It resists physical attacks to recover the keys. For instance
it might erase them if you try to remove the casing from the HSM.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In order to actually encrypt or decrypt, you first unlock the HSM
with the password, but that doesn&#39;t give you the keys, but just
lets you use the HSM to do encryption and decryption. However, until
you enter the password, it won&#39;t do anything.&lt;/p&gt;
&lt;p&gt;The main function of the HSM is to limit the rate at which you can try
passwords. This might happen by simply having a flat limit of X tries
per second, or maybe it exponentially backs off the more passwords you
try, or maybe it will only allow some small number of failures (10 is common)
before it erases itself. If you&#39;ve ever pulled your iPhone out of your pocket
only to see &amp;quot;iPhone is disabled, try again in 5 minutes&amp;quot;, that&#39;s the
rate limiting mechanism in action.  Whatever the technique, the idea
is the same: prevent the attacker from quickly trying a large number
of candidate passwords. With a properly designed rate limiting
mechanism, you can get away with a much much shorter passwords.  For
instance, if you can only have 10 tries before the phone erases
itself, then the attacker only has a 1/1000 chance of breaking a 4
digit PIN, let alone a 16 character password. Some HSMs can also do
biometric authentication to unlock the encryption key, which is how
features like TouchID and FaceID work.&lt;/p&gt;
&lt;p&gt;So, having the encryption keys in an HSM is a big improvement to
security and it doesn&#39;t require any change in the user interface --
you just type in your password -- which is great. What&#39;s not so great
is that it&#39;s not always clear whether your device has a TPM or not. As
a practical matter, new Apple devices do, as does the Google
&lt;a href=&quot;https://www.blog.google/products/pixel/titan-m-makes-pixel-3-our-most-secure-phone-yet/&quot;&gt;Pixel&lt;/a&gt;.
The situation on Windows 10 is
&lt;a href=&quot;https://docs.microsoft.com/en-us/windows/security/information-protection/tpm/tpm-recommendations&quot;&gt;maybe&lt;/a&gt;
but many modern devices will.&lt;/p&gt;
&lt;p&gt;It needs to be said that an HSM isn&#39;t magic: iPhones store their keys
in HSMs and it certainly makes it much harder to decrypt them, but there
are also companies who sell technology for breaking into HSM-protected
devices like iPhones (&lt;a href=&quot;https://www.cellebrite.com/en/home/&quot;&gt;Cellebrite&lt;/a&gt; being probably the best known),
but you&#39;re far better off with a device like this than you are without.
And of course all bets are off if someone takes your device when it&#39;s
unlocked. This is why it&#39;s a good idea to have your screen set
to lock automatically after a fairly short time; obviously that&#39;s
a lot more convenient if you have fingerprint or face ID.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/disk-encryption/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h1 id=&quot;summary&quot;&gt;Summary &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/disk-encryption/#summary&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;OK, so this has been a pretty long series, but I hope it&#39;s given
you an appreciation for all the different settings in which passwords
are used and where they are safe(r) versus unsafe.&lt;/p&gt;
&lt;p&gt;As always, I can be reached at &lt;a href=&quot;mailto:ekr-blog@mozilla.com&quot;&gt;ekr-blog@mozilla.com&lt;/a&gt; if you have questions
or comments.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Some computers allow you to install a firmware password
which will stop the computer from booting unless you enter
the right password. This isn&#39;t totally useless but it&#39;s not
a defense if the attacker is willing to remove the disk. &lt;a href=&quot;https://educatedguesswork.org/posts/disk-encryption/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Also called a &lt;em&gt;Secure Encryption Processor&lt;/em&gt; (SEP) or a &lt;em&gt;Trusted Platform Module&lt;/em&gt; (TPM). &lt;a href=&quot;https://educatedguesswork.org/posts/disk-encryption/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;It&#39;s not technically necessary to keep the keys in HSM in order
to secure the device against password guessing. For instance, once the
HSM is unlocked it could just output the key and let decryption happen
on the main CPU. The problem is that this then exposes you to attacks
on the non-tamper-resistant hardware that makes up the rest of the
computer. For this reason, it&#39;s better to have the key kept inside the
HSM. Note that this only applies to the &lt;em&gt;keys&lt;/em&gt; in the HSM, not the
data in your computer&#39;s memory, which generally isn&#39;t encrypted, and
there are &lt;a href=&quot;https://www.usenix.org/legacy/event/sec08/tech/full_papers/halderman/halderman.pdf&quot;&gt;ways&lt;/a&gt; to read that memory. If
you are worried your computer might be seized and searched, as in a
border crossing, do what the pros do and turn it off. &lt;a href=&quot;https://educatedguesswork.org/posts/disk-encryption/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Unfortunately, biometric ID also makes it a lot easier to be
compelled to unlock your phone--whatever the legal situation in your
jurisdiction, someone can just press your finger against the reader,
but it&#39;s a lot harder to make you punch in your PIN--so it&#39;s a bit of a tradeoff. &lt;a href=&quot;https://educatedguesswork.org/posts/disk-encryption/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Subject: A look at password security, Part IV: WebAuthn</title>
		<link href="https://educatedguesswork.org/posts/webauthn/"/>
		<updated>2020-08-20T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/webauthn/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As discussed in &lt;a href=&quot;https://blog.mozilla.org/blog/2020/07/20/a-look-at-password-security-part-iii-more-secure-login-protocols/&quot;&gt;part
III&lt;/a&gt;,
public key authentication is great in principle but in practice has
been hard to integrate into the Web environment. However, we&#39;re now
seeing deployment of a new technology called
&lt;a href=&quot;https://www.w3.org/TR/webauthn/&quot;&gt;WebAuthn (short for Web Authentication)&lt;/a&gt; that
hopefully changes that.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/webauthn/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Previous approaches to public key authentication required the browser
to provide the user interface. For a variety of reasons (the
interfaces were bad, the sites wanted to control the experience) this
didn&#39;t work well for sites, and public key authentication didn&#39;t get
much adoption. WebAuthn takes a different approach, which is to
provide a JavaScript API that the site can use to do public key
authentication via the browser.&lt;/p&gt;
&lt;p&gt;The key difference here is that previous systems tended to operate at
a lower layer (typically HTTP or TLS), which made it hard for the site
to control how and when authentication happened.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/webauthn/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
By contrast, a JS API puts the site in control so it
can ask for authentication when it wants to (e.g., after showing
the home page and prompting for the username).&lt;/p&gt;
&lt;h1 id=&quot;some-technical-details&quot;&gt;Some Technical Details &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webauthn/#some-technical-details&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;WebAuthn offers two new API points that are used by the server&#39;s
JavaScript [Technical note: These are buried in the &lt;a href=&quot;https://www.w3.org/TR/credential-management-1/&quot;&gt;credential management API&lt;/a&gt;.]:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;makeCredential&lt;/em&gt;: Creates a new public key pair and
returns the public key.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;getAssertion&lt;/em&gt;: Sign with an existing credential over
a challenge provided by the server.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The way this is used in practice is that when the user first
registers with the server -- or as is more likely now,
when the server first adds WebAuthn support or detects
that a client has it -- the server uses &lt;code&gt;makeCredential()&lt;/code&gt;
to create a new public key pair and stores the public key, possibly along with an attestation.
An attestation is a provable statement such as, &amp;quot;this public key was minted by a YubiKey.&amp;quot;
Note that unlike some public key authentication systems,
each server gets its own public key so WebAuthn is
harder to use for cross-site tracking (more on this later).
Then when the user returns, the site uses &lt;code&gt;getAssertion()&lt;/code&gt;,
causing the browser to sign the server&#39;s challenge using the
private key associated with the public key. The server can
then verify the assertion, allowing it to determine that the
client is the same endpoint as originally registered (for
some value of &amp;quot;the same&amp;quot;. More on this later too).&lt;/p&gt;
&lt;p&gt;The clever bit here is that because this is all hidden behind
a JS API, the site can authenticate the client at any part of
its login experience it wants without disrupting the user
experience. In particular, WebAuthn can be used as a second
factor in addition to a password or as a primary authenticator
without a password.&lt;/p&gt;
&lt;h1 id=&quot;hardware-authenticators&quot;&gt;Hardware Authenticators &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webauthn/#hardware-authenticators&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The WebAuthn specification doesn&#39;t require any particular mechanism
for handling the key pair, so it&#39;s technically possible to implement
WebAuthn entirely in the browser, storing the key on the user&#39;s disk.
However, the designers of WebAuthn and its predecessor FIDO U2F
were very concerned about the user&#39;s machine being compromised and the
private key being stolen, which would allow the attacker to impersonate
the user indefinitely (just like if your password was compromised).&lt;/p&gt;
&lt;p&gt;Accordingly, WebAuthn was explicitly designed around having the
key pair in a hardware token. These tokens are designed to do all
the cryptography internally and never expose the key, so if
your computer is compromised, the attacker may be impersonate you
temporarily, but they won&#39;t be able to steal the key.
This also has the advantage that the token is portable, so you can
pull it out of your computer and carry it with you -- thus minimizing
the risk of your computer being stolen -- or plug it into a second
computer; it&#39;s the token that matters not the computer it&#39;s plugged into.
We&#39;re also starting to see hardware backed designs that don&#39;t depend
on a token. For instance, modern Macs have trusted hardware built
in to power TouchID and FaceID and Apple is using this to
&lt;a href=&quot;https://www.theverge.com/2020/6/24/21301509/apple-safari-14-browser-face-touch-id-logins-webauthn-fido2&quot;&gt;implement WebAuthn&lt;/a&gt;. We have been looking at
similar designs for Firefox.&lt;/p&gt;
&lt;p&gt;While hardware key storage isn&#39;t mandatory, WebAuthn was designed to
allow sites to require it. Obviously you can&#39;t just trust the browser
when it says that it&#39;s storing the key in hardware and so WebAuthn
includes an
&lt;a href=&quot;https://en.wikipedia.org/wiki/Trusted_computing#Remote_attestation&quot;&gt;attestation&lt;/a&gt;
scheme that is designed to let the site determine the type of token/device
being used for WebAuthn. However, there are privacy concerns about
the attestation scheme &lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/webauthn/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; and so many sites don&#39;t require it. Firefox
shows a separate prompt (shown below) when the site requests
attestation.&lt;/p&gt;
&lt;p&gt;[TODO]&lt;/p&gt;
&lt;h1 id=&quot;privacy-properties-and-user-interactivity&quot;&gt;Privacy Properties and User Interactivity &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webauthn/#privacy-properties-and-user-interactivity&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;While as a technical matter a browser or token could just do all the
WebAuthn computations automatically with no user interaction, that&#39;s
not really what you want for two reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It allows sites to track users without their consent (this
&lt;a href=&quot;https://freedom-to-tinker.com/2017/12/27/no-boundaries-for-user-identities-web-trackers-exploit-browser-login-managers/&quot;&gt;already happens with user login fields&lt;/a&gt;
which is why Firefox requires that the user interact with
the page before filling in your username or password.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It would allow an attacker who had compromised your computer
to invisibly log in as you.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In order to prevent this, FIDO-compliant tokens require the user to do
something (typically touch the token) before signing an
assertion. This prevents invisible tracking or use of the key to log
in. Apple&#39;s use of FaceID/TouchID takes this one step further,
requiring a specific user to authorize a login, thus protecting you in
case your laptop is stolen.&lt;/p&gt;
&lt;h1 id=&quot;alternative-designs&quot;&gt;Alternative Designs &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webauthn/#alternative-designs&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;If you&#39;re familiar with Web technologies, you might be wondering
why we need something new here. In particular, many of the properties
of WebAuthn could be replicated with cookies or WebCrypto. However,
WebAuthn offers a number of advantages over these alternatives.&lt;/p&gt;
&lt;p&gt;First, because WebAuthn requires user interaction prior to authentication
it is much harder to use for tracking. This means that the browser
doesn&#39;t need to clear WebAuthn state when it clears cookie or
WebCrypto state as they can be used for invisible tracking. It
would be possible to add some kind of explicit user action
step before accessing cookies or WebCrypto but then you would have something new.&lt;/p&gt;
&lt;p&gt;Second, when used with keys in hardware, WebAuthn is more resistant
to machine compromise. By contrast, cookies and WebCrypto state
are generally stored in storage which is available directly
to the browser, so if it&#39;s compromised they can be stolen.
While this is a real issue, it&#39;s unclear how important it is:
many sites use cookies for authentication over fairly long
periods (when was the last time you logged into Facebook?)
and so an attacker who steals your cookies will still be
able to impersonate you for a long period. And of course
the cost of this is that you have to buy a token.&lt;/p&gt;
&lt;h1 id=&quot;adoption-status&quot;&gt;Adoption Status &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webauthn/#adoption-status&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Technically, WebAuthn is a pretty big improvement over pre-existing
systems. However, authentication systems tend to rely pretty heavily
on network effects: it&#39;s not worth users enabling it unless a lot of
sites use it and it&#39;s not worth sites enabling it unless a lot of
users are willing to sign up. So for, indications are pretty
promising: a number of important sites such as GSuite and Github
already support WebAuthn as do SSO vendors like Okta and Duo. All four
major browsers support it as well.
With any luck we&#39;ll be seeing a lot more WebAuthn deployment
over the next few years -- a big step forward
for user security.&lt;/p&gt;
&lt;h1 id=&quot;up-next%3A-login-and-device-encryption&quot;&gt;Up Next: Login and Device Encryption &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webauthn/#up-next%3A-login-and-device-encryption&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;This about wraps it up for remote authentication, but what about
logging into your computer or phone? I&#39;ll be covering that next.&lt;/p&gt;
&lt;h1 id=&quot;acknowledgement&quot;&gt;Acknowledgement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/webauthn/#acknowledgement&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Thanks to JC Jones and Chris Wood for help with this post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The WebAuthn spec is pretty hard to read. MDN&#39;s &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Web_Authentication_API&quot;&gt;article&lt;/a&gt; does a better job. &lt;a href=&quot;https://educatedguesswork.org/posts/webauthn/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For instance, with TLS the easiest thing to do is to authenticate
the user as soon as they connect, but this means you don&#39;t get to show
any UI, which is awkward for users who don&#39;t yet have accounts. You can
also do &amp;quot;TLS renegotiation&amp;quot; later in the connection but for a variety
of technical reasons that has proven hard to integrate with servers.
In addition, any TLS-level authentication is an awkward fit for
CDNs because the TLS is terminated at the CDN, not at the origin. &lt;a href=&quot;https://educatedguesswork.org/posts/webauthn/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The idea behind the attestation mechanism is that the
device manufacturer issues a certificate to the device and
device uses the corresponding private key to sign the new
generated authentication key. However, if that certificate
is unique to the device and used for every site than it
becomes a tracking vector. The specification suggests two
(somewhat clunky) mechanisms for reducing the risk here, but neither is
mandatory. &lt;a href=&quot;https://educatedguesswork.org/posts/webauthn/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>A look at password security, Part III: More secure login mechanisms</title>
		<link href="https://educatedguesswork.org/posts/password-proto/"/>
		<updated>2020-07-20T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/password-proto/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In &lt;a href=&quot;https://blog.mozilla.org/blog/2020/07/13/password-security-part-ii/&quot;&gt;part II&lt;/a&gt;, we looked at the problem of Web authentication and covered
the twin problems of phishing and password database compromise. In
this system, I&#39;ll be covering some of the technologies that have been
developed to address these issues.&lt;/p&gt;
&lt;p&gt;This is mostly a story of failure, though with a sort of hopeful note
at the end. The ironic thing here is that we&#39;ve known for decades how to build
authentication technologies which are much more secure than the kind
of passwords we use on the Web. In fact, we use one of these
technologies -- public key authentication via digital certificates --
to authenticate the server side of every HTTPS transaction before you
send your password over. HTTPS supports certificate-base client
authentication as well, and while it&#39;s commonly used in other
settings, such as SSH, it&#39;s rarely used on the Web. Even if we restrict
ourselves to passwords, we have long had
&lt;a href=&quot;https://en.wikipedia.org/wiki/Password-authenticated_key_agreement&quot;&gt;technologies&lt;/a&gt;
for password authentication which completely resist phishing, but they
are not integrated into the Web technology stack at all.
The problem, unfortunately, is less about cryptography than about
deployability, as we&#39;ll see below.&lt;/p&gt;
&lt;h1 id=&quot;two-factor-authentication-and-one-time-passwords&quot;&gt;Two Factor Authentication and One-Time Passwords &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/password-proto/#two-factor-authentication-and-one-time-passwords&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The most widely deployed technology for improving password security goes by
the name &lt;em&gt;one-time passwords&lt;/em&gt; (OTP) or (more recently) &lt;em&gt;two-factor authentication&lt;/em&gt; (2FA).
OTP actually goes back to well before the widespread use of encrypted
communications or even the Web to the days when people would log in
to servers in the clear using &lt;a href=&quot;https://en.wikipedia.org/wiki/Telnet&quot;&gt;Telnet&lt;/a&gt;. It
was of course well known that Telnet was insecure and that anyone who shared
the network with you could just sniff your password off the wire&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/password-proto/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; and
then login with it [Technical note: this is called a &lt;em&gt;replay attack&lt;/em&gt;.]
One partial fix for this attack was to supplement the user password with
another secret which wasn&#39;t static but rather changed every time you logged
in (hence a &amp;quot;one-time&amp;quot; password).&lt;/p&gt;
&lt;p&gt;OTP systems came in a variety of forms but the most common was a token about
the size of a car key fob but with an LCD display, like this:&lt;/p&gt;
&lt;img src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/RSA_SecurID_Token_Old.jpg/1920px-RSA_SecurID_Token_Old.jpg&quot; width=&quot;300/&quot; /&gt;
&lt;p&gt;The token would produce a new pseudorandom numeric code every 30 seconds or so and
when you went to log in to the server you would provide both your
password and the current code. That way, even if the attacker got the code
they still couldn&#39;t log in as you for more than a brief period&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/password-proto/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
unless they also stole your token. If all of this looks familiar, it&#39;s because this is more or less the
same as modern OTP systems such as &lt;a href=&quot;https://www.google-authenticator.com/&quot;&gt;Google Authenticator&lt;/a&gt;,
except that instead of a hardware token, these systems tend to use an app on
your phone and have you log into some Web form rather than over Telnet.
The reason this is called &amp;quot;two-factor authentication&amp;quot; is that authenticating
requires both a value you know (the password) and something you have
(the device). Some other systems use a code that is sent over SMS but the
basic idea is the same.&lt;/p&gt;
&lt;p&gt;OTP systems don&#39;t provide perfect security, but they do significantly
improve the security of a password-only system in two respects:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;They guarantee a strong, non-reused secret. Even if you
reuse passwords and your password on site A is compromised, the
attacker still won&#39;t have the right code for site B.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/password-proto/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ol&gt;
&lt;li&gt;They mitigate the effect of phishing. If you are successfully
phished the attacker will get the current code for the site and
can log in as you, but they won&#39;t be able to log in in the future
because knowing the current code doesn&#39;t let you predict a future
code. This isn&#39;t great but it&#39;s better than nothing.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The nice thing about a 2FA system is that it&#39;s comparatively easy to
deploy: it&#39;s a phone app you download plus another code that the site
prompts you for. As a result, phone-based 2FA systems are very popular
(and if that&#39;s all you have, I advise you to use it, but see below
for my real recommendation).&lt;/p&gt;
&lt;h1 id=&quot;password-authenticated-key-agreement&quot;&gt;Password Authenticated Key Agreement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/password-proto/#password-authenticated-key-agreement&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;One of the nice properties of 2FA systems is that they do not require
modifying the client at all, which is obviously convenient for deployment.
That way you don&#39;t care if users are running Firefox or Safari or Chrome,
you just tell them to get the second factor app and you&#39;re good to go.
However, if you &lt;em&gt;can&lt;/em&gt; modify the client you can protect your
password rather than just limiting the impact of having it stolen.
The technology to do this is called a &lt;a href=&quot;https://en.wikipedia.org/wiki/Password-authenticated_key_agreement&quot;&gt;Password Authenticated Key Agreement&lt;/a&gt; (PAKE) protocol.&lt;/p&gt;
&lt;p&gt;The way a PAKE would work on the Web is that it would be integrated into the TLS connection
that already secures your data on its way to the Web server. On the
client side when you enter your password the browser feeds it into TLS
and on the other side, the server feeds in a &lt;em&gt;verifier&lt;/em&gt; (effectively a
password hash). If the password matches the verifier, then the
connection succeeds, otherwise it fails. PAKEs aren&#39;t easy to design
-- the tricky part is ensuring that the attacker has to reconnect to
the server for each guess at the password -- but it&#39;s a reasonably
well understood problem at this point and there are several PAKEs
which can be &lt;a href=&quot;https://tools.ietf.org/html/draft-sullivan-tls-opaque-00&quot;&gt;integrated with TLS&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What a PAKE gets you is security against phishing: even if you
connect to the wrong server, it doesn&#39;t learn anything about your
password that it doesn&#39;t already know because you just get
a cryptographic failure. PAKEs don&#39;t help against password file compromise because
the server still has to store the verifier, so the attacker can
perform a password cracking attack on the verifier just as they
would on the password hash. But phishing is a big deal, so why
doesn&#39;t everyone use PAKEs? The answer here seems to be
surprisingly mundane but also critically important: user interface.&lt;/p&gt;
&lt;p&gt;The way that most Web sites authenticate is by showing you a Web
page with a field where you can enter your password, as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/posts/password-proto/fxa-login.png&quot; alt=&quot;Login page&quot; /&gt;&lt;/p&gt;
&lt;p&gt;When you click the &amp;quot;Sign In&amp;quot; button, your password gets sent to the
server which checks it against the hash as described in part I.
The browser doesn&#39;t have to do anything special here (though often
the password field will be specially labelled so that the browser can
automatically mask out your password when you type); it just sends the contents
of the field to the server.&lt;/p&gt;
&lt;p&gt;In order to use a PAKE, you would need to replace this with a
mechanism where you gave the browser your password directly.
Browsers actually have something for this, dating back to the
earliest days of the Web. On Firefox it looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://educatedguesswork.org/posts/password-proto/fx-login.png&quot; alt=&quot;Firefox login&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Hideous, right? And I haven&#39;t even mentioned the part where it&#39;s a modal
dialog that takes over your experience. In principle, of course, this might
be fixable, but it would take a lot of work and would still leave the
site with a lot less control over their login experience than they have
now; understandably they&#39;re not that excited about that.
Additionally, while a PAKE is secure from phishing if you use it, it&#39;s
not secure if you don&#39;t, and nothing stops the phishing site from skipping
the PAKE step and just giving you an ordinary login page, hoping you&#39;ll
type in your password as usual.&lt;/p&gt;
&lt;p&gt;None of this is to say that PAKEs aren&#39;t cool tech, and they make a lot
of sense in systems that have less flexible authentication experiences; for
instance, your email client probably already requires you to enter your authentication
credentials into a dialog box, and so that could use a PAKE. They&#39;re also
useful for things like device pairing or account access where you want to start
with a small secret and bootstrap into a secure connection. Apple is known to
use &lt;a href=&quot;http://srp.stanford.edu/&quot;&gt;SRP&lt;/a&gt;, a particular PAKE, for &lt;a href=&quot;https://blog.cryptographyengineering.com/2018/10/19/lets-talk-about-pake/&quot;&gt;exactly this reason&lt;/a&gt;.
But because the Web already offers a flexible experience, it&#39;s hard to ask sites
to take a step backwards and PAKEs have never really taken off for the Web.&lt;/p&gt;
&lt;h1 id=&quot;public-key-authentication&quot;&gt;Public Key Authentication &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/password-proto/#public-key-authentication&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;From a security perspective, the strongest thing would be to have the
user authenticate with a public private key pair, just like the Web
server does. As I said above, this is a feature of TLS that browsers
actually have supported (sort of) for a really long time but the
user experience is even more appalling than for builtin passwords.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/password-proto/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
In principle, some of these technical issues could have been fixed,
but even if the interface had been better, sites would probably
still have wanted to control the experience themselves. In any case,
public key authentication saw very little usage.&lt;/p&gt;
&lt;p&gt;It&#39;s worth mentioning that public key authentication actually is
reasonably common in dedicated applications, especially in software
development settings. For instance, the popular
&lt;a href=&quot;https://en.wikipedia.org/wiki/Secure_Shell&quot;&gt;SSH&lt;/a&gt; remote login tool
(replacing the unencrypted Telnet) is commonly used with public key
authentication. In the consumer setting, &lt;a href=&quot;https://support.apple.com/guide/security/welcome/web&quot;&gt;Apple Airdrop usesiCloud-issued certificates with TLS&lt;/a&gt;
to authenticate your contacts.&lt;/p&gt;
&lt;h1 id=&quot;up-next%3A-fido%2Fwebauthn&quot;&gt;Up Next: FIDO/WebAuthn &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/password-proto/#up-next%3A-fido%2Fwebauthn&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;This was the situation for about 20 years: in
theory public key authentication was great, but in practice it was
nearly unusable on the Web. Everyone used passwords, some with 2FA and
some without, and nobody was really happy. There had been a few
attempts to try to fix things but nothing really stuck.
However, in the past few years a new technology called
&lt;a href=&quot;https://www.w3.org/TR/webauthn/&quot;&gt;WebAuthn&lt;/a&gt; has been developed.
At heart, WebAuthn is just public key authentication but it&#39;s
integrated into the Web in a novel way which seems to be a lot
more deployable than what has come before. I&#39;ll be covering
WebAuthn in the next post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;And by &amp;quot;wire&amp;quot; I mean a &lt;a href=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/5/5f/BNC_connector_with_10BASE2_cable-92170.jpg/1280px-BNC_connector_with_10BASE2_cable-92170.jpg&quot;&gt;literal wire&lt;/a&gt;, though
such sniffing attacks are prevalent in wireless networks such as those protected
by WPA2 &lt;a href=&quot;https://educatedguesswork.org/posts/password-proto/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note that to really make this work well, you also need to require
a new code in order to change your password, otherwise the attacker can change your
password for you in that window. &lt;a href=&quot;https://educatedguesswork.org/posts/password-proto/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Interestingly, OTP systems are still subject to server-side
compromise attacks. The way that most of the common systems work
is to have a per-user secret which is then used to generate a
series of codes, e.g., truncated &lt;em&gt;HMAC(Secret, time)&lt;/em&gt; (see &lt;a href=&quot;https://tools.ietf.org/rfcmarkup?doc=6238&quot;&gt;RFC6238&lt;/a&gt;).
If an attacker compromises
the secret, then they can generate the codes themselves. One might ask whether it&#39;s possible
to design a system which didn&#39;t store a secret on the server
but rather some public verifier (e.g., a public key) but this
does not appear to be secure if you also want to have short
(e.g., six digits) codes. The reason is that if the information
that is used to verify is public, the attacker can just iterate
through every possible 6 digit code and try to verify it themselves.
This is easily possible during the 30 second or so lifetime of
the codes. Thanks to Dan Boneh for this insight. &lt;a href=&quot;https://educatedguesswork.org/posts/password-proto/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The details are kind of complicated here, but just some of the
problems (1) TLS client authentication is mostly tied to certificates
and the process of getting a certificate into the browser was
just terrible (2) The certificate selection interface is clunky
(3) Until TLS 1.3, the certificate was actually sent in the clear unless
you did TLS renegotiation, which had its own problems, particularly around privacy &lt;a href=&quot;https://educatedguesswork.org/posts/password-proto/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>A look at password security, Part II: Web sites</title>
		<link href="https://educatedguesswork.org/posts/passwords2/"/>
		<updated>2020-07-13T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/passwords2/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In part I, we took a look at the design of password authentication
systems for old-school multiuser systems. While timesharing
is mostly gone, most of us continue to use multiuser systems;
we just call them Web sites. In this post, I&#39;ll be covering
some the problems of Web authentication using passwords.&lt;/p&gt;
&lt;p&gt;As I discussed previously, the strength of passwords depends to a
great extent on how fast the attacker can try candidate passwords. The
nature of a Web application inherently limits the velocity at which
you can try passwords quite a bit.  Even ignoring limits on the rate
which you can transmit stuff over the network, real systems -- at
least well managed ones -- have all kinds of monitoring software which
is designed to detect large numbers of login attempts, so just trying
millions of candidate passwords is not very effective. This doesn&#39;t
mean that remote attacks aren&#39;t possible: you can of course try to log
in with some of the obvious passwords and hope you get lucky,
and if you have a good idea of a candidate password, you can try
that (see below), but this kind of attack is inherently somewhat
limited.&lt;/p&gt;
&lt;h1 id=&quot;remote-compromise-and-password-cracking&quot;&gt;Remote compromise and password cracking &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/passwords2/#remote-compromise-and-password-cracking&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Of course, this kind of limitation in the number of login attempts
you could make also applied to the old multiuser systems and the
way you attack Web sites is the same: get a copy of the password
file and remotely crack it.&lt;/p&gt;
&lt;p&gt;The way this plays out is that somehow the attacker exploits
a vulnerability in the server&#39;s system to compromise the
password database.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; They can then crack it offline and try to recover
people&#39;s passwords. Once they&#39;ve done that, they they can then use
those passwords to log into the site themselves. If a site&#39;s
password database is stolen, their only real defense is to reset
everyone&#39;s password, which is obviously really inconvenient, harms
the site&#39;s brand, and runs the risk of user attrition, and so
doesn&#39;t always happen.&lt;/p&gt;
&lt;p&gt;To make matters worse, many users use the same password on multiple
sites, so once you have broken someone&#39;s password on one site, you can
then try to login as them on other sites with the same password, even
if you do a reset on the site which was originally compromised.  Even
though this is an online attack, it&#39;s still very effective, because
password reuse is so common (this is one reason why it&#39;s a bad idea to
reuse passwords).&lt;/p&gt;
&lt;p&gt;Password database disclosure is unfortunately quite a common
occurrence, so much so that there are services such as
&lt;a href=&quot;https://monitor.firefox.com/&quot;&gt;Firefox Monitor&lt;/a&gt; and
&lt;a href=&quot;https://haveibeenpwned.com/&quot;&gt;Have I been pwned?&lt;/a&gt; devoted to letting
users know when some service they have an account on has been
compromised.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/#fn1&quot; id=&quot;fnref1:1&quot;&gt;[1:1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Assuming a site is already following best practices (long passwords, slow
password hashing algorithms, salting, etc.) then the next step is to either
make it harder to steal the password hash or to make the password hash
less useful. A good example here is the Facebook
system described in this &lt;a href=&quot;https://www.youtube.com/watch?v=7dPRFoKteIU&quot;&gt;talk&lt;/a&gt;
by Alec Muffett (famous for, among other things, the
&lt;a href=&quot;https://en.wikipedia.org/wiki/Crack_(password_software)&quot;&gt;Crack&lt;/a&gt; password
cracker). The system uses multiple layers of hashing, one of which is
a keyed hash [technically, HMAC-SHA256] performed on a separate, hardened, machine. Even if
you compromise the password hash database, it&#39;s not useful without the
key, which means you would also have to compromise that machine as well.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Another defense is to use &lt;a href=&quot;https://en.wikipedia.org/wiki/One-time_password&quot;&gt;one-time password&lt;/a&gt;
systems (often also called two-factor authentication systems). I&#39;ll
cover those in a future post.&lt;/p&gt;
&lt;h1 id=&quot;phishing&quot;&gt;Phishing &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/passwords2/#phishing&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Leaked passwords aren&#39;t the only threat to password authentication on
Web sites. The other big issue is what&#39;s called
&lt;a href=&quot;https://en.wikipedia.org/wiki/Phishing&quot;&gt;phishing&lt;/a&gt;. In the basic
phishing attack, the attacker sends you an e-mail inviting you to log
into your account. Often this will be phrased in some scary way like
telling you your account will be deleted if you don&#39;t log in
immediately. The e-mail will helpfully contain a link to use to log
in, but of course this link will go not to the real site but to
the attacker&#39;s site, which will usually look just like the real
site and may even have a similar domain name (e.g., &lt;code&gt;mozi11a.com&lt;/code&gt; instead
of &lt;code&gt;mozilla.com&lt;/code&gt;.) When the user clicks on the link and logs in,
the attacker captures their username and password and can then
log into the real site. Note that having users use good passwords
totally doesn&#39;t help here because the user gives the site
their whole password.&lt;/p&gt;
&lt;p&gt;Preventing phishing has proven to be a really stubborn challenge
because, well, people are generally too trustworthy. Most modern browsers try to
warn users if they are going to known phishing sites (Firefox
uses the &lt;a href=&quot;https://safebrowsing.google.com/&quot;&gt;Google Safe Browsing service&lt;/a&gt;
for this). In addition, if you use a password manager, then
it shouldn&#39;t automatically fill in your password on a phishing
site because password managers key off of the domain name and
just looking similar isn&#39;t good enough. Of course, both of these
defenses are imperfect: the lists of phishing sites can be incomplete
and if users don&#39;t use password managers or are willing to manually
cut and paste their passwords, then phishing attacks are still
possible.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h1 id=&quot;beyond-passwords&quot;&gt;Beyond Passwords &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/passwords2/#beyond-passwords&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The good news is that we now have standards and technologies which
are better than simple passwords and are more resistant to these kinds of
attacks. I&#39;ll be talking about them in the next post.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The design of this kind of system is actually quite an interesting
technical challenge which I hope to get around to documenting at
some point in the future. &lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt; &lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/#fnref1:1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The Facebook system is actually pretty ornate. At least as of 2014 they
had four separate layers: MD5, HMAC-SHA1 (with a public salt), HMAC-SHA256(with a secret key), and
&lt;a href=&quot;https://en.wikipedia.org/wiki/Scrypt&quot;&gt;Scrypt&lt;/a&gt;, and then
HMAC-SHA256 (with public salt) again, Muffet&#39;s talk and
&lt;a href=&quot;http://bristolcrypto.blogspot.com/2015/01/password-hashing-according-to-facebook.html&quot;&gt;this post&lt;/a&gt;
do a good job of
providing the detail, but this design is due to a combination of technical
requirements. In particular, the reason for the MD5 stage is that an older
system &lt;em&gt;just&lt;/em&gt; had MD5-hashed passwords and because Facebook doesn&#39;t know
the original password they can&#39;t convert them to some other algorithm;
it&#39;s easiest to just layer another hash on. &lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is an example of a situation in which the difficulty
of implementing a good password manager makes the problem much
worse. Sites vary a lot in how they present their password
dialogs and so password managers have trouble finding the right
place to fill in the password. This means that users sometimes
have to type the password in themselves even if there is actually
a stored password, teaching them bad habits which phishers can
then exploit &lt;a href=&quot;https://educatedguesswork.org/posts/passwords2/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>A look at password security, Part I: history and background</title>
		<link href="https://educatedguesswork.org/posts/passwords1/"/>
		<updated>2020-07-08T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/passwords1/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Today I&#39;d like to talk about passwords. Yes, I know, passwords are the
worst, but why? This is the first of a series of posts about passwords,
with this one focusing on the origins of our current password systems
starting with log in for multi-user systems.&lt;/p&gt;
&lt;p&gt;The conventional story for what&#39;s wrong with passwords goes something
like this: Passwords are simultaneously too long for users to memorize
and too short to be secure.&lt;/p&gt;
&lt;p&gt;It&#39;s easy to see how to get to this conclusion. If we restrict
ourselves to just letters and numbers, then there are about 2^6 one
character passwords, 2^12 two character passwords, etc. The fastest
password cracking systems can check about
&lt;a href=&quot;https://www.tomsguide.com/us/8-character-password-dead,news-29429.html&quot;&gt;2^36 passwords/second&lt;/a&gt;,
so if you want a password which takes a year to crack, you need
a password of 10 characters long or longer.&lt;/p&gt;
&lt;p&gt;The situation is actually far worse than this; most people don&#39;t
use randomly generated passwords because they are hard to generate
and hard to remember. Instead they tend to use words, sometimes
adding a number, punctuation, or capitalization here and there. The result
is passwords that are easy to crack, hence the need for password
managers and the like.&lt;/p&gt;
&lt;p&gt;This analysis isn&#39;t &lt;em&gt;wrong&lt;/em&gt;, precisely; but when you think about it a
bit, it&#39;s kind of confusing. If you&#39;ve ever watched a movie where
someone tries to break into a computer by typing passwords over and
over, you&#39;re probably thinking &amp;quot;nobody is a fast enough typist to try
billions of passwords a second&amp;quot;. This is obviously true, so where does
password cracking come into it?&lt;/p&gt;
&lt;h1 id=&quot;how-to-design-a-password-system&quot;&gt;How to design a password system &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/passwords1/#how-to-design-a-password-system&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;The design of password systems dates back to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Unix&quot;&gt;UNIX&lt;/a&gt;
operating system, designed back in the 1970s. This is before personal computers and
so most computers were shared, with multiple people having accounts and
the operating system being responsible for protecting one user&#39;s
data from another. Passwords were used to prevent someone else
from logging into your account.&lt;/p&gt;
&lt;p&gt;The obvious way to implement a password system is just to store all
the passwords on the disk and then when someone types in their
password, you just compare what they typed in to what was stored. This
has the obvious problem that if the password file is compromised, then
every password in the system is also compromised. This means that
any operating system vulnerability that allows a user to read the
password file can be used to log in as other users. To make matters
worse, multiuser systems like UNIX would usually have administrator
accounts that had special privileges (the UNIX account is called
&amp;quot;root&amp;quot;). Thus, if a user could compromise the password file they
could gain root access (this is known as a &amp;quot;privilege escalation&amp;quot;
attack).&lt;/p&gt;
&lt;p&gt;The UNIX designers realized that a better approach is
to use what&#39;s now called password hashing: instead of storing the
password itself you store what&#39;s called a &lt;a href=&quot;https://en.wikipedia.org/wiki/One-way_function&quot;&gt;one-way function&lt;/a&gt; of the
password. A one-way function is just a function &lt;em&gt;H&lt;/em&gt; that&#39;s
easy to compute in one direction but not the other.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/passwords1/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
This is conventionally done with what&#39;s called a
&lt;a href=&quot;https://en.wikipedia.org/wiki/Hash_function&quot;&gt;hash function&lt;/a&gt;,
and so the technique is known as &amp;quot;password hashing&amp;quot;
and the stored values as &amp;quot;password hashes&amp;quot;&lt;/p&gt;
&lt;p&gt;In this case, what that means is you store the pair: (Username, &lt;em&gt;H(Password)&lt;/em&gt;).
[Technical note: I&#39;m omitting [salt](&lt;a href=&quot;https://en.wikipedia.org/wiki/Salt_(cryptography)&quot;&gt;https://en.wikipedia.org/wiki/Salt_(cryptography)&lt;/a&gt;, which is
used to mitigate offline pre-computation attacks against the password file.).]
When the user tries to log in, you take the password they enter
&lt;em&gt;P&lt;/em&gt; and compute &lt;em&gt;H(P)&lt;/em&gt;. If &lt;em&gt;H(P)&lt;/em&gt; is the same as the stored password,
then you know their password is right (with overwhelming probability) and you allow them to log
in, otherwise you return an error. The cool thing about this design
is that even if the password file is leaked, the attacker learns
only the password hashes.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/passwords1/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h1 id=&quot;problems-and-countermeasures&quot;&gt;Problems and countermeasures &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/passwords1/#problems-and-countermeasures&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;This design is a huge improvement over just having a file with
cleartext passwords and it might seem at this point like you didn&#39;t
need to stop people from reading the password file at all. In fact, on
the original UNIX systems where this design was used, the
&lt;code&gt;/etc/passwd&lt;/code&gt; file was publicly readable. However, upon further
reflection, it has the drawback that it&#39;s cheap to verify a guess for
a given password: just compute &lt;em&gt;H(guess)&lt;/em&gt; and compare it to what&#39;s
been stored. This wouldn&#39;t be much of an issue if people used strong
passwords, but because people generally choose bad passwords, it is
possible to write password cracking programs which would try out
candidate passwords (typically starting with a list of common
passwords and then trying variants) to see if any of these matched.
Programs to do this task quickly emerged.&lt;/p&gt;
&lt;p&gt;The key thing to realize is that the computation of &lt;em&gt;H(guess)&lt;/em&gt; can be
done offline. Once you have a copy of the password file, you can compare your
pre-computed hashes of candidate passwords against the password file
without interacting with the system at all. By contrast, in an &lt;em&gt;online&lt;/em&gt; attack
you have to interact with the system for each guess, which gives
it an opportunity to rate limit you in various ways (for instance
by taking a long time to return an answer or by locking out the
account after some number of failures). In an offline attack,
this kind of countermeasure is ineffective.&lt;/p&gt;
&lt;p&gt;There are three obvious defenses to this kind of attack:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Make the password file unreadable: If the attacker can&#39;t read the
password, they can&#39;t attack it. It took a while to do this on UNIX
systems, because the password file also held a lot of other user-type
information that you didn&#39;t want kept secret, but eventually
that got split out into another file in what&#39;s called &amp;quot;shadow
passwords&amp;quot; (the passwords themselves are stored in &lt;code&gt;/etc/shadow&lt;/code&gt;.
Of course, this is just the natural design for Web-type applications
where people log into a server.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Make the password hash slower: The cost of cracking is linear in
the cost of checking a single password, so if you make the password
hash slower, then you make cracking slower. Of course, you also
make logging in slower, but as long as you keep that time reasonably
short (below a second or so) then users don&#39;t notice. The tricky
part here is that attackers can build specialized hardware that
is much faster than the commodity hardware running on your machine,
and designing hashes which are thought to be slow even on specialized
hardware is a whole subfield of cryptography.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Get people to choose better passwords: In theory this sounds good,
but in practice it&#39;s resulted in enormous numbers of conflicting
rules about password construction. When you create an account and
are told you need to have a password between 8 and 12 characters
with one lowercase letter, one capital letter, a number and
one special character from this set -- but not from this other set --
what they&#39;re hoping you will do is create a strong passwords.
Experience suggests you are just as likely to use &lt;code&gt;Password1!&lt;/code&gt;,
so the situation here has not improved that much unless people
use password managers which generate passwords for them.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;the-modern-setting&quot;&gt;The modern setting &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/passwords1/#the-modern-setting&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;At this point you&#39;re probably wondering what this has to do with
you: almost nobody uses multiuser timesharing systems any more
(although a huge fraction of the devices people use are effectively
UNIX: MacOS is a straight-up descendent of UNIX and Linux and Android
are UNIX clones). The multiuser systems that people do
use are mostly Web sites, which of course use usernames and
passwords. In future posts I will cover password security for
Web sites and personal devices.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Strictly speaking we need the function not just to be one-way
but also to be preimage resistant, meaning that given
&lt;em&gt;H(P)&lt;/em&gt; it&#39;s hard to find &lt;em&gt;any&lt;/em&gt; input &lt;em&gt;p&lt;/em&gt; such that &lt;em&gt;H(p) == H(P)&lt;/em&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/passwords1/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For more information on this, see &lt;a href=&quot;https://www.bell-labs.com/usr/dmr/www/passwd.ps&quot;&gt;Morris and Thompson&lt;/a&gt;
for quite readable history of the UNIX design. One very interesting
feature is that at the time this system was designed generic
hash functions didn&#39;t exist, and so they instead used a variant
of &lt;a href=&quot;https://en.wikipedia.org/wiki/Data_Encryption_Standard&quot;&gt;DES&lt;/a&gt;.
The password was converted into a DES key and then used to encrypt
a fixed value. This is actually a pretty good design and even
included a feature designed to prevent attacks using custom
DES hardware. However, it had the unfortunate property that passwords
were limited to 8 characters, necessitating new algorithms
that would accept a longer password. &lt;a href=&quot;https://educatedguesswork.org/posts/passwords1/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>COVID Surveillance Part 2: Mobile Phone Location</title>
		<link href="https://educatedguesswork.org/posts/telco-data/"/>
		<updated>2020-05-06T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/telco-data/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Previously I &lt;a href=&quot;https://blog.mozilla.org/blog/2020/04/29/designs-contact-tracing-apps/&quot;&gt;wrote&lt;/a&gt; about the use of mobile apps for COVID
contact tracing. This idea gotten a lot of attention in the tech press
-- probably because there are some quite interesting privacy issues
-- but there is another approach to monitoring people&#39;s locations
using their devices that has already been used in
&lt;a href=&quot;https://www.reuters.com/article/us-health-coronavirus-taiwan-surveillanc-idUSKBN2170SK&quot;&gt;Taiwan&lt;/a&gt;
and
&lt;a href=&quot;https://techcrunch.com/2020/03/18/israel-passes-emergency-law-to-use-mobile-data-for-covid-19-contact-tracing/&quot;&gt;Israel&lt;/a&gt;,
namely mobile phone location data. While this isn&#39;t something that
people think about a lot, your mobile phone has to be in constant
contact with the mobile system and the system can use that information to
&lt;a href=&quot;https://en.wikipedia.org/wiki/Mobile_phone_tracking&quot;&gt;determine your location&lt;/a&gt;.
Mobile phones already use network-based location to provide
&lt;a href=&quot;https://en.wikipedia.org/wiki/Enhanced_9-1-1&quot;&gt;emergency location services&lt;/a&gt;
and for what&#39;s called &lt;a href=&quot;https://en.wikipedia.org/wiki/Assisted_GPS&quot;&gt;assisted GPS&lt;/a&gt;,
in which mobile-tower based location is used along with satellite-based GPS,
but it can, of course, be used for services the user might be less excited about,
such as real-time surveillance of their location. In addition to measurements
taken from the tower, a number of mobile services share location history
with service providers, for instance to provide directions in mapping
applications or as &lt;a href=&quot;https://support.google.com/accounts/answer/3118687?hl=en&quot;&gt;part of your Google account&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If what you are trying to do is get as much of COVID
surveillance as possible, this kind of data has several
big advantages over mobile phone apps. First, it&#39;s already being
collected, so you don&#39;t need to get anyone to install an app.
Second, it&#39;s extremely detailed because it has everyone&#39;s location and not
just who they have been in contact with. The primary disadvantage of
mobile phone location data is accuracy; in some absolute sense,
assisted GPS is amazingly accurate, especially to those old enough to
remember when handheld GPS was barely a thing, but generally we&#39;re
talking about accuracies to the scale of &lt;a href=&quot;https://en.wikipedia.org/wiki/Mobile_phone_tracking&quot;&gt;meters to tens of
meters&lt;/a&gt;, which is
not good enough to tell whether you have been in close contact with
someone. This is still useful enough for many applications and we&#39;re
seeing this kind of data used for a number of anti-COVID purposes such
as detecting people crowding in a given location, determining &lt;a href=&quot;https://www.straitstimes.com/asia/east-asia/coronavirus-taiwans-new-electronic-fence-for-quarantines-leads-wave-of-virus&quot;&gt;when people have broken quarantine&lt;/a&gt;
and &lt;a href=&quot;https://www.theverge.com/2020/4/3/21206318/google-location-data-mobility-reports-covid-19-privacy&quot;&gt;measuring bulk
movements&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But of course, all of this is only possible because everyone is
already carrying around a tracking device in their pocket all the time
and they don&#39;t even think about it.
These systems just routinely log
information about your location whether you downloaded some app or
not, and it&#39;s just a limitation of the current technology that
that information isn&#39;t precise down to the meter
(and this kind of positioning technology
has gotten better over time because precise localization of mobile
devices is key to getting good performance).
By contrast, nearly all of the designs for
mobile contact tracing explicitly prioritize privacy. Even the
centralized designs like BlueTrace that have the weakest privacy
properties still go out of their way to avoid leaking information,
mostly by not collecting it.
So, for instance, if you test positive BlueTrace
tells the government who you have been in contact with, if you aren&#39;t
exposed to Coronavirus the government doesn&#39;t learn much about
you&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/telco-data/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The important distinction to draw here is between &lt;em&gt;policy&lt;/em&gt; controls to
protect privacy and &lt;em&gt;technical&lt;/em&gt; controls to protect privacy.  Although
the mobile network gets to collect a huge amount of data on you, this
data is to some extent protected by policy: laws, regulations, and
corporate commitments
constraining how that data can be used&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/telco-data/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; and you have to trust that
those policies will be followed. By contrast, the privacy protections in
the various COVID-19 contact tracing apps are largely technical: they
don&#39;t rely on trusting the health authority to behave properly because
the health authority doesn&#39;t have the information in its hands in the
first place. Another way to think about this is that technical
controls are &amp;quot;rigid&amp;quot; in that they don&#39;t depend on human discretion:
this is obviously an advantage for users who don&#39;t want to have to
trust government, big tech companies, etc.  but it&#39;s also a
disadvantage in that it makes it difficult to respond to new
circumstances. For instance, Google was able to quickly take mobility
measurements using stored location history because people were already
sharing that with them, but the new Apple/Google contact tracing will
require people to download new software and maybe opt-in, which
can be slow and result in &lt;a href=&quot;https://www.straitstimes.com/singapore/about-one-million-people-have-downloaded-the-tracetogether-app-but-more-need-to-do-so-for&quot;&gt;low uptake&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The point here isn&#39;t to argue that one type of control is necessarily
better or worse than another. In fact, it&#39;s quite common to have systems
which depend on a mix of these&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/telco-data/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;. However, when you are trying to evaluate
the privacy and security properties of a system, you need to keep this
distinction firmly in mind: every policy control depends on someone or
a set of someones behaving correctly, and therefore either requires
that you trust them to do so or have some mechanism for ensuring that
they in fact are.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Except that whenever you contact the government servers
for new TempIDs it learns something about your current location. &lt;a href=&quot;https://educatedguesswork.org/posts/telco-data/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For instance, the United States Supreme
Court recently &lt;a href=&quot;https://en.wikipedia.org/wiki/Carpenter_v._United_States&quot;&gt;ruled&lt;/a&gt;
that the government requires a warrant to get mobile phone location
records. &lt;a href=&quot;https://educatedguesswork.org/posts/telco-data/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For instance, the Web certificate system, which but relies extensively
on procedural but is increasingly backed up by technical safeguards
such as &lt;a href=&quot;https://en.wikipedia.org/wiki/Certificate_Transparency&quot;&gt;Certificate Transparency&lt;/a&gt;. &lt;a href=&quot;https://educatedguesswork.org/posts/telco-data/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
	
	<entry>
		<title>Looking at designs for COVID Contact Tracing Apps</title>
		<link href="https://educatedguesswork.org/posts/contact-tracing/"/>
		<updated>2020-04-29T00:00:00Z</updated>
		<id>https://educatedguesswork.org/posts/contact-tracing/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This post originally appeared on the Mozilla Blog&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A number of the proposals for how to manage the COVID-19 pandemic rely
on being able to determine who has come into contact with infected
people and therefore are at risk of infection themselves.
&lt;a href=&quot;https://bluetrace.io/&quot;&gt;Singapore&lt;/a&gt;,
&lt;a href=&quot;https://www.reuters.com/article/us-health-coronavirus-taiwan-surveillanc/taiwans-new-electronic-fence-for-quarantines-leads-wave-of-virus-monitoring-idUSKBN2170SK?utm_campaign=The%20Interface&amp;amp;utm_medium=email&amp;amp;utm_source=Revue%20newsletter&quot;&gt;Taiwan&lt;/a&gt;
and &lt;a href=&quot;https://techcrunch.com/2020/03/18/israel-passes-emergency-law-to-use-mobile-data-for-covid-19-contact-tracing/&quot;&gt;Israel&lt;/a&gt; have already deployed phone-based tracking
technology and several
&lt;a href=&quot;https://www.americanprogress.org/issues/healthcare/news/2020/04/03/482613/national-state-plan-end-coronavirus-crisis/&quot;&gt;recent&lt;/a&gt;
&lt;a href=&quot;https://drive.google.com/file/d/1vIN2AX-DDNW-S0aHq8xs0RJ2jkR_CckX/view&quot;&gt;proposals&lt;/a&gt;
for re-opening the US economy depend on some sort of contact tracing
system. There has been a huge amount of work in this area (see the list
&lt;a href=&quot;https://docs.google.com/document/d/16Kh4_Q_tmyRh0-v452wiul9oQAiTRj8AdZ5vcOJum9Y/edit#&quot;&gt;here&lt;/a&gt;),
with perhaps the best known effort being the joint
&lt;a href=&quot;https://www.nytimes.com/2020/04/10/technology/apple-google-coronavirus-contact-tracing.html&quot;&gt;announcement&lt;/a&gt; by Apple and Google.
that they would be building this kind of functionality into iOS
and Android.&lt;/p&gt;
&lt;p&gt;To some extent what&#39;s going on here is just that this is a nicely
packaged, accessible, technical problem -- learn some things, keep
others secret? Sounds like a job for crypto! -- and so we have
a number of approaches that are quite similar. However, the other
thing you see is that these solutions embed quite different assumptions
about how they are going to be used and what kind of privacy properties
you need and that ends up giving you a variety of different design.&lt;/p&gt;
&lt;h1 id=&quot;a-centralized-system-(bluetrace)&quot;&gt;A Centralized System (BlueTrace) &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/contact-tracing/#a-centralized-system-(bluetrace)&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Let&#39;s start by looking at Singapore&#39;s system, &lt;a href=&quot;https://bluetrace.io/&quot;&gt;BlueTrace&lt;/a&gt;,
which describes itself as a &amp;quot;Privacy-Preserving Cross-Border Contact Tracing&amp;quot; system.
As shown in the figure below,
BlueTrace works by having the health authority run a central server which issues each user
a series of TempIDs, each of which is an encrypted token that
contains the user&#39;s identity and is good for about 15 minutes.
When two devices encounter each other, they exchange TempIDs, so
your device gradually accumulates a list of the TempIDs of all
the devices you have come into contact with. If you test positive,
you upload all those TempIDs to the health authority (your own
TempIDs are irrelevant), which then decrypts them and is able
to identify all the people that you might have infected and can
take appropriate action.&lt;/p&gt;
&lt;p&gt;&#92;begin{tikzpicture}
[
device/.style={rectangle, minimum height=2.5cm, minimum width=1.5cm, draw, rounded corners, align=center},&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;=Stealth
]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;% 1. The registration phase
&#92;node (ap) at (0, 0) [device] {Alice&#39;s&#92;Phone};
&#92;path (0, 5) node (ha) [circle, x radius=1.5cm, y radius=1cm, align=center,draw] {Health&#92;Authority};
&#92;path [-&amp;gt;] (ap) edge [bend left=15] node [above, sloped] {Register} (ha);
&#92;path [-&amp;gt;] (ha) edge [bend left=15] node [above, sloped] {TempIDs} (ap);
&#92;node [below] at (ap.south)  {Registration};&lt;/p&gt;
&lt;p&gt;% 2. The contact phase
&#92;node (apl) at (4, 0) [device] {Alice&#39;s&#92;Phone};
&#92;node (bpl) at (9, 0) [device] {Bob&#39;s&#92;Phone};
&#92;path [-&amp;gt;] ($(apl.east) + (0,.6cm)$) edge node [above] {Alice&#39;s TempID} ($(bpl.west) + (0,.6cm)$);
&#92;path [-&amp;gt;] ($(bpl.west) + (0,-.6cm)$) edge node [above] {Bob&#39;s TempID} ($(apl.east) + (0,-.6cm)$);
&#92;node [below] at ($(apl.south east) + (1.75cm, 0)$)  {Alice encounters Bob};&lt;/p&gt;
&lt;p&gt;% 3. The report phase
&#92;node (apn) at (13, 0) [device] {Alice&#39;s&#92;Phone};
&#92;node (bpn) at (17, 0) [device] {Bob&#39;s&#92;Phone};
&#92;path (15,5) node (han) [circle, x radius=1.5cm, y radius=1cm, align=center,draw] {Health&#92;Authority};
&#92;path [-&amp;gt;] (apn) edge node [above, sloped, align=center] {Received&#92;TempIDs} (han);
&#92;path [-&amp;gt;] (han) edge node [above, sloped] {Notification} (bpn);
&#92;node [below, align=center] at ($(apn.south east) + (1.25cm, 0)$)  {Alice tests positive.&#92;Health authority notifies Bob.};&lt;/p&gt;
&lt;p&gt;&#92;end{tikzpicture}&lt;/p&gt;
&lt;p&gt;The BlueTrace protocol provides good privacy against other
people and limited privacy against the health authority. Specifically,
other people never learn your COVID status at all, both because
they are encrypted and because they are kept on your device
unless you test positive, and even then are sent only to the
health authority. The health authority doesn&#39;t learn anything
until you test positive, but after that happens they learn all
of your contacts. This is by design because the whole design
explicitly assumes that the health authority will know people&#39;s
contact status and take action.&lt;/p&gt;
&lt;h1 id=&quot;decentralized-systems&quot;&gt;Decentralized Systems &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/contact-tracing/#decentralized-systems&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Unsurprisingly, many have concerns about a system which allows
the government to see all your contacts (see, for instance,
the Chaos Computer Club&#39;s &lt;a href=&quot;https://www.ccc.de/en/updates/2020/contact-tracing-requirements&quot;&gt;list&lt;/a&gt; of
desirable properties). There have been a number of designs that
are instead decentralized, notably the Apple/Google design and
the &lt;a href=&quot;https://github.com/DP-3T&quot;&gt;DP^3T&lt;/a&gt; system designed by EPFL
and ETHZ, and which are designed to allow people to determine
whether they have been in contact with someone infected
without allowing the health authority to determine people&#39;s
contacts. However, as we see below, there is some difficulty around
exactly &lt;em&gt;how much&lt;/em&gt; people should learn about the contacts they
have had.&lt;/p&gt;
&lt;p&gt;The specific details of individual proposals vary a lot but
the figure below shows simplified design: whenever
the app on your phone sees that it&#39;s near another phone running the
app it generates a random number and sends it to that phone; the
other phone does the same. Each app remembers all the numbers it has
sent and received so at the end of
the day you end up with a pile of stored numbers. If you later test
positive, you push some button on the app which publishes all of the
values that you sent. Every so often, your app downloads the list
of published values and looks to see if any of them matches the
values received. If they do, that means that you have been in
contact with someone who tested positive. Note the important difference
from BlueTrace in that you upload the IDs you &lt;em&gt;sent&lt;/em&gt;, not those
you received, and so the health authority never learns who you
came into contact with.&lt;/p&gt;
&lt;p&gt;&#92;begin{tikzpicture}
[
device/.style={rectangle, minimum height=2.5cm, minimum width=1.5cm, draw, rounded corners, align=center},&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;=Stealth
]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;% 1. The contact phase
&#92;node (apl) at (0, 0) [device] {Alice&#39;s&#92;Phone};
&#92;node (bpl) at (5, 0) [device] {Bob&#39;s&#92;Phone};
&#92;path [-&amp;gt;] ($(apl.east) + (0,.6cm)$) edge node [above] {Token=A2} ($(bpl.west) + (0,.6cm)$);
&#92;path [-&amp;gt;] ($(bpl.west) + (0,-.6cm)$) edge node [above] {Token=B5} ($(apl.east) + (0,-.6cm)$);
&#92;node [below] at ($(apl.south east) + (1.75cm, 0)$)  {Alice encounters Bob};&lt;/p&gt;
&lt;p&gt;% 2. The report phase
&#92;node (ap) at (9, 0) [device] {Alice&#39;s&#92;Phone};
&#92;path (9, 5) node (ha) [circle, x radius=1.5cm, y radius=1cm, align=center,draw] {Health&#92;Authority};
&#92;path [-&amp;gt;] (ap) edge node [above, sloped, align=center] {A1, A2, A3, ...} (ha);
&#92;node [below] at (ap.south)  {Alice tests positive};&lt;/p&gt;
&lt;p&gt;% 3. The registration phase
&#92;node (ap) at (15, 0) [device] {Bob&#39;s&#92;Phone};
&#92;path (15, 5) node (ha) [circle, x radius=1.5cm, y radius=1cm, align=center,draw] {Health&#92;Authority};
&#92;path [-&amp;gt;] (ap) edge [bend left=15] node [above, sloped] {Positive tests?} (ha);
&#92;path [-&amp;gt;] (ha) edge [bend left=15] node [above, sloped] {A1, A2, A3, ...} (ap);
&#92;node [below] at (ap.south)  {Bob polls for positive tests};&lt;/p&gt;
&lt;p&gt;&#92;end{tikzpicture}&lt;/p&gt;
&lt;p&gt;This particular design isn&#39;t very efficient because it involves
publishing a huge number of of values, and so many of the real designs
involve generating the values in some deterministic fashion which
makes publishing them more efficient, but it&#39;s close enough to let us
see the privacy properties. First, let&#39;s make sure people learn what
we expect them to learn: As expected, the operator of the system
learns who has tested positive because they get to see who publishes
their values&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/contact-tracing/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;. Similarly, people get to learn that they have
been in contact with someone who has tested positive by looking to see
if their received values overlap with the published sent values.&lt;/p&gt;
&lt;p&gt;Next, we need to ask if people learn anything besides what they were
supposed to learn. Because the health authority only learns what
messages infected people sent, it doesn&#39;t get to trace their contacts.
And as long as the numbers are random, you can&#39;t use
this to track someone.  However, it turns out that you
get to learn not only &lt;em&gt;that&lt;/em&gt; you were in contact with someone who
tested positive but also &lt;em&gt;who&lt;/em&gt; tested positive as long as you record
who you were near at the time you received each value. This
doesn&#39;t sound that terrible, but consider an attacker who puts up a
combination phone/camera outside of a testing clinic. Whenever someone
walks by he records their value and takes a picture of them. At
the end of every day he looks to see which values have been
published and then uses facial recognition to determine their real
identities. This kind of setup is very cheap and it would be easy to
learn the COVID status of many people.&lt;/p&gt;
&lt;p&gt;It&#39;s possible to mostly remove this attack at the cost of giving
the operator more information: each user can upload all the values
they receive and have the operator tell them if there is a match.
This trades off one kind of privacy threat (third parties)
for another (the health authority) and it&#39;s worth noting that
this form of attack is very hard with BlueTrace.
With enough fancy cryptography&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/contact-tracing/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;, it&#39;s probably possible to
get back to the &amp;quot;ideal&amp;quot; state in which the
user just learns whether they have been in contact with a single
person who was infected. However, there&#39;s a tradeoff here:
Users may want more information than just were they in contact
with someone; for instance they might want to know when and for
how long. If we design a system that hides this information from
the user, then they may find the information less useful than
if they were able to know &amp;quot;I was next to Joe for an hour and
sneezed on me and now he&#39;s positive&amp;quot;. It seems quite difficult if not
impossible to design a system which lets people have enough
information to feel like they understand their risk and doesn&#39;t also
make it possible for attackers with modest resources to learn a lot of
people&#39;s COVID status, because this is basically the same information.&lt;/p&gt;
&lt;h1 id=&quot;what-do-we-want%2C-anyway%3F&quot;&gt;What do we want, anyway? &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/contact-tracing/#what-do-we-want%2C-anyway%3F&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;It&#39;s tempting, of course, to ask if one design is better than
the others, but upon closer inspection, it seems like there are really
three separate use models people have in mind here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Inform the authorities about who might need to be tested
or quarantined.&lt;/li&gt;
&lt;li&gt;Serve as a sort of digital permission slip to access various
services (see, for instance, this proposal by
&lt;a href=&quot;https://www.americanprogress.org/issues/healthcare/news/2020/04/03/482613/national-state-plan-end-coronavirus-crisis/&quot;&gt;proposal&lt;/a&gt; the Center for American Progress&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://educatedguesswork.org/posts/contact-tracing/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;).
the app is going to be used as a digital permission slip to access&lt;/li&gt;
&lt;li&gt;Inform people that they might have been infected so they can
consider getting tested.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you are trying to deploy the first kind of system, then it doesn&#39;t
make any sense to try to avoid the health authority learning who
might have come in contact with infected people because the
health authority staff need to reach out to them. On the other
hand, if you are trying to deploy the second and third type of systems,
then you probably do want to protect the user&#39;s data from the health
authority as much as possible, and then you have to ask how much
you want users of the system to learn.&lt;/p&gt;
&lt;p&gt;What this really comes down to is the question of
&lt;em&gt;what are we trying to accomplish?&lt;/em&gt; which in this case, means
&lt;em&gt;what do we want our contact tracing system to do?&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is it providing information for users of the system or for public health authorities?&lt;/li&gt;
&lt;li&gt;What do we expect to do with this information? Notify people? Let them do things?&lt;/li&gt;
&lt;li&gt;How much are we comfortable with users learning about other people&#39;s COVID status?&lt;/li&gt;
&lt;li&gt;How much are we comfortable with the operator learning about people&#39;s COVID status?&lt;/li&gt;
&lt;li&gt;How much complexity are we willing to tolerate? This is a matter of both implementation
cost and of user confidence in the system.&lt;/li&gt;
&lt;li&gt;Are we willing to force people to participate in this system?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Any system design necessarily embodies our answers to these questions, but these
are fundamentally policy questions, not technology questions. Once we know the
answers to that, then we will know what kind of system we want and
can make a start at may be able to design something that meets our needs.&lt;/p&gt;
&lt;h1 id=&quot;acknowledgement&quot;&gt;Acknowledgement &lt;a class=&quot;direct-link&quot; href=&quot;https://educatedguesswork.org/posts/contact-tracing/#acknowledgement&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Thanks to Chris Wood, Dan Boneh, Henry Corrigan-Gibbs, and Luke Crouch for helpful
discussions on this topic. Thanks especially for Laura Thomson for the taxonomy
in the final section.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;It&#39;s possible to remove this property by having users submit their
values anonymously. The Apple/Google system tries to split the
difference by just requiring the operator to delete this data. &lt;a href=&quot;https://educatedguesswork.org/posts/contact-tracing/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;As the DP^3T white paper observes, allowing people to learn some
information about other&#39;s infection status is inherent in any system which allows users to determine
if they have been in contact with an infected person. If you don&#39;t
see that many people and you learn about when you were infected, then
you can infer who the report is about. There are a variety of mitigations
which can reduce this risk but at the end of the day some level
of exposure is just built into the system. &lt;a href=&quot;https://educatedguesswork.org/posts/contact-tracing/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&amp;quot;Airline passengers must download the Contact Tracing app, confirm no close proximity to a positive case, and pass a fever check or show documentation of immunity from a serological test&amp;quot;. &lt;a href=&quot;https://educatedguesswork.org/posts/contact-tracing/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
	</entry>
</feed>
