Educated Guesswork

Understanding age assurance accuracy

How facial age estimation can produce more accurate results than ID-based systems

Note: you will probably want to view this post on the web because there is some math notation that uses MathJax to render.

I've recently seen a number of pieces of pieces about age assurance that want to talk about the degree to which age assurance mechanisms are "accurate" or "effective". In discussions like this it's common for people to talk about accuracy and effectiveness as if they were unitary quantities that could be measured along a single axis, as in this diagram by Audrey Hingle, which shows effectiveness on the Y axis and privacy on the X axis.

Age Assurance Mechanisms (from Hingle)

Age Assurance Mechanisms (Source: Hingle)

I don't agree with the placement of a lot of the boxes on this diagram, but I want to focus on the placement of two, "Biometric Age Estimation" which is shown as not very effective, just above "Parental/guardian attestation" and "Digital ID/Credential-based verification", which is shown as very effective. This is a pretty commonly expressed sentiment; for example Flanagan writes:

Verification tends to be highly accurate, but it often requires linking the user to a real-world identity document. Estimation can be less intrusive, but it may introduce accuracy issues and potential bias. Age assurance systems frequently combine multiple techniques in an attempt to balance these tradeoffs.

What I want to do in this post is less to quibble about these precise assessments—though don't worry, there will be some of that—than to try to explain how to think about these questions.

Background: Testing Errors #

Stepping away from the question of age assurance, let's say we have a test for something else, such as testing for pregnancy. Any given individual can be in one of two states, i.e., they are pregnant or they do not. Similarly, the test has two possible results:

Positive
The person is pregnant.
Negative
The person is not pregnant.

This gives us a two-by-two matrix of states and outcomes:

Test Result
Negative Positive
Patient
State
Not Pregnant True negative False positive
Pregnant False negative True positive

The on-diagonal values show accurate tests, in which the test is reporting the right result, and the off-diagonal values show incorrect results. It's conventional to refer to the accurate values as "True Negative" and "True Positive" respectively and the inaccurate values as "False Positive" and "False Negative" respectively. Note that "Positive" and "Negative" isn't about good or bad—whether you being pregnant is good or not depends on your situation—but rather about whether you are positive for the disease.

The most common way to characterize the performance of this kind of test is to talk about the error rates. Going back to this table, suppose that we give the test to two thousand people where we have some ground truth information about their status, for instance because we have an ultrasound or the like. If half of them were Pregnant and half Not Pregnant, then we can derive the following contingency table:

Patient State
Not Pregnant Pregnant
Test
Result
Negative 950 (95%) 200 (20%)
Positive 50 (5%) 800 (80%)

In this case, we would say that this test has:

  • A "false positive rate" of 5% and a "false negative rate" of 20%.
  • A "true negative" rate of 95% and a "true positive rate" of 80%.

Note that each column has to add up to 100%, because any given person must have a test result of positive or negative, but there's no reason that the false positive rate and the false negative rate have to be the same, and very often they will not be.

Unfortunately, there is a lot of confusing terminology in this area, so you'll often hear the following terms:

Sensitivity:
The fraction of time you should get a positive result that you actually do (the same as "true positive rate")
Specificity
The fraction of time that you should get a negative result that you actually do (the same as "true negative rate").

I said above the false positive rate and false negative rate don't have to be the same, but they are related, or rather inversely related. I'll have more to say about this below, but just to give you some intuition, suppose your test kit is broken and just always returns "Pregnant". In this case, we have the following contingency table:

Patient State
Not Pregnant Pregnant
Test
Result
Negative 0 0
Positive 1000 (100%) 1000 (100%)

The good news is that this test correctly identifies all the Pregnant people (100% TPR, 0% FNR). The bad news is that it identifies all the Not Pregnant people as Pregnant people (100% FPR, 0% TNR). Conversely, if it always returned "Not Pregnant" we would have a 0% FPR but a 100% FNR. Understanding this basic fact is critical to reasoning about test accuracy; if you just look at false positives or false negatives you are not getting an accurate understanding of how well a test works.

Continuous Quantities #

Consumer pregnancy tests work by detecting the presence of the human chorionic gonadotropin (hCG) hormone in urine, by measuring the interaction of the hCG hormone with some hCG-specific antibodies. The point here is that this process discretizes the continuous quantity, by going from the level of the hormone to a yes or no decision. hCG occurs in very high levels in pregnant women and low levels in non-pregnant women, so this is a good test, but it's complicated by a number of factors:

  • Non-pregnant women still have some hCG
  • It takes some time for hCG levels to ramp up

Here's a totally made up diagram to give you the idea:

Pregnancy false positives and false negatives

Pregnancy false positives and false negatives. Setting the diagnostic threshold lower reduces false negatives but increases false positives. Setting it higher increases false negatives but decreases false positives.

As a consequence you need to set a threshold for how sensitive you want the test to be, with the idea that it mostly excludes non-pregnant women but includes pregnant women. On home pregnancy tests, this is done by selecting specific antibodies and determining how much antibody to use for the test; the more of each, the more sensitive the test, and the more pregnant women you will pick up (higher TPR) but also the more non-pregnant women you will pick up (higher FPR). Given a particular underlying measurement technique there's no way around this underlying tradeoff, because of the underlying distributions of the quantity being measured; you just have to pick a threshold you are comfortable with. According to Wikipedia, consumer tests are designed to have a false positive rate of less than or equal to 5% on the day of the first missed period, with the result that the false negative rate is whatever you get with that false positive rate.[1]

In this scenario, as in any scenario where we are discretizing a continuous quantity, there are actually two levels at which you can look at accuracy:

  • How good a job you do of measuring the quantity itself (in this case hCG levels).
  • How good a job you do of distinguishing the two states of interst (in this case pregnant vs. not pregnant).

We'll see the same situation reoccur with age assurance.

Random versus Deterministic Errors #

At a high level, there are two kinds of test errors: random and deterministic.

Deterministic errors are those which tend to reoccur if you run the same test repeatedly. For instance some women use the test incorrectly (potentially causing false negatives), have some non-pregnancy medical condition that causes high hCG levels, or take medications that cause high hCG levels. If these women retest, they will tend to get the same incorrect result. Random errors are those which tend to be case independent. For example, whatever process is used to make the test kit may not lay down a consistent amount of reagent. Both deterministic and random effects may be present in any given case, and they can interact both positively or negatively. For instance, if you already have a naturally high hCG level that is right near the threshold, then you are more likely to have a false positive if the test strip also has an unusually high amount of reagent.

Base Rates #

If you take a COVID test with a 5% false positive rate and it comes up positive, that means there is a 95% chance you have COVID, right? Wrong. What a 5% false positive rate means is that if you give the test to 100 people who do not have COVID, about 5 will test positive, which is not the same thing at all. To see this, consider the following two situations:

  • If you were to give a COVID test to 100 people back in 2010 before COVID had emerged, you would still get 5 or so people testing positive, even though none of them had COVID.

  • If you were to take a population of people who had COVID and test them, all of the people who tested positive would have COVID, not just 95%.

This is what is called the problem of base rate. If you start with a population that has a given proportion X of people who should test positive (i.e., they have COVID), then a random individual has a chance X of having COVID; a positive test should make you think that they have a chance higher than X of having COVID—assuming the test is any good—but that exact chance is determined by both the base rate of people who are infected and the accuracy of the test. If X is very low, it may still be far more likely that a given positive is a false positive than a true positive, depending on the test.

Suppose we have a population of 1000 people, of whom 100 actually have COVID, and a test with a false positive rate of 5% and a false negative rate of 5%, with all the errors being random. Multiplying this out, we get:

Patient State
Not Infected Infected
Test
Result
Negative 855 5
Positive 45 95

So, we get a total of 140 positive tests, out of which 45 are not infected, with the result that your chance of being infected are 95/140 = .68.

We can generalize this result, but we'll need some notation:

  • The probability of testing positive (the true positive rate) given that you are infected as $P(Pos|Infected)$ (the $P(A|B)$ notation means "probability of A given B".
  • The probability of testing positive (the false positive rate) if $P(Pos|NotInfected)$.
  • The base rate of infection is $P(Infected)$, so the probability of non-infection is $1-P(Infected)$.

Given this population, we have the following probabilities:

Patient State
Not Infected Infected
Test
Result
Negative $(1-P(Pos|NotInfected)) * (1-P(Infected))$ $P(Pos|NotInfected) * (1-P(Infected))$
Positive $(1-P(Pos|Infected)) * (1-P(Infected))$ $P(Pos|Infected) * P(Infected)$

We only care about the people who tested positive and in particular, what we want to know is what fraction of people who have positive tests are actually infected, which is to say P(Infected|Positive). As before, we can get this by dividing the fraction of people who are positive by the total number of people who tested positive, i.e.,

$$ P(Infected|Pos) = \frac{P(Pos|Infected) \cdot P(Infected)} {P(Pos|Infected) \cdot P(Infected) + P(Pos|NotInfected)) \cdot (1-P(Infected))} $$

This is a little complicated but as the bottom half is just the probability of testing positive overall, which we can write $P(Pos)$, giving us:

$$ P(Infected|Pos) = \frac{P(Pos|Infected) \cdot P(Infected)} {P(Pos)} $$

Congratulations, we've just triumphantly reinvented Bayes's Theorem[2] which is the basis of a lot of modern statistical techniques. You won't really need the math later, but it's helpful to understand the main concept, which is that when interpreting the results of a test it's important to know the underlying distribution of the quantity you're trying to measure.

Age Assurance As a Testing Process #

With this background, we're now ready to think about age assurance properly, which is to say as a testing process. In other words, we have some technical age assurance mechanism which we subject the user to, and the result comes back as either the user is within the desired age range ("accept") or the user is outside the desired age range ("reject"). I'm deliberately not using the word "positive" and "negative" here because you could think of this test in one of two ways:

  • Detecting people who are within the age range, in which case a "positive" result would be those you accept.
  • Detecting people who are outside the age range, in which case a "positive" result would be those you reject.

In my experience people seem to use the former definition, but I find it hard to keep it straight because in other contexts, you might reject people who were positive (e.g., if they had COVID), so I'll instead be using the terms "false accept" and "false reject", which aren't subject to this kind of confusion.[3]

With that in mind, let's take a look at two paradigmatic forms of age assurance, namely facial age estimation and government IDs.

Facial Age Estimation #

The basic idea behind facial age estimation is that you train a machine learning model to predict people's ages based on their facial features. The details of these models work don't really matter, but at a high level, a model like this can output one of two things:

  • A probability distribution of user's ages, each with an associated probability.

  • A "point estimate" of the user's age, which is to say what age the model thinks the user is, potentially with some indicator of confidence.

The second of these is actually just a special case of the first, and can generally be derived by taking the most probable age, but it's also a very common output form, as it's easier to reason about with than a probability distribution, given that at the end of the day you need to either accept or reject someone.

Once you have such a model, the evaluator prompts the user to provide a picture of their face, typically by turning on the camera (on the Web, using the getUserMedia API) and capturing a facial image. Sometimes the user is asked to assume particular facial poses to demonstrate "liveness". Once the image is captured, the system then tries to estimate the user's age.

Facial age estimation exhibits both systematic and random errors:

  • Some people look older or younger than average, and so may have their age misestimated.

  • Facial age estimation systems exhibit a surprising amount of variation in the result even when the face is held constant.

The figure below provides an example of the second effect; it shows estimates of a single 58-year-old subject from individual frames of a 60 second video. As you can see, even in this controlled environment there is substantial variation.

Patrick Grother age estimates

Estimates of age over a 60 second video clip by various algorithms. Source: [Rescorla, Arnao, and Cooper 2026](https://kgi.georgetown.edu/research-and-commentary/age-assurance-online/). Original figure by Kate Hudson.

Of course, estimates of age aren't random; rather, they cluster around the subject's true age, as shown in the following (notional)[4] figure:

Errors for age estimation

Notional example of estimated vs true age.

People above the red line have been overestimated. People under the red line are underestimated. In practice, we typically only want precision to around a year, as indicated by the blue lines, which are one year off in either direction. The actual level of error differs between algorithms, but typical algorithms show mean absolute error (MAE) values of around 1-2 years (the diagram above uses an MAE of around 2). This is what people mean when they say that these methods are not very accurate.

Measuring Facial Estimation Accuracy #

It's surprisingly hard to get good data on the accuracy of facial age estimation. The most comprehensive data comes from NIST's FATE project, which uses a number of test images drawn from sources like immigration/visa applications, mugshots, and border crossing images. These are generally of low resolution (around 300x300 pixels or less) and of varying quality (lighting, orientation, etc.) By contrast, the cameras on people's computers and mobile devices are generally much higher quality (an old Apple iPhone 12 has a 12 MP camera).

Because higher quality images produce more accurate results, NIST's data underestimates the accuracy of facial age estimation. For example, for NIST's 252x300 NIST reports a mean absolute error of 2.67 years for 13-17 year olds, whereas Yoti's data for 720x800 images reports a MAE value of 1.1. Australia's independent testing of Yoti's system produces results that are more comparable to Yoti's results.

Unfortunately, much of the reported data for facial age estimation systems focuses on the false accept rate, but because in practice systems are deployed with a buffer, what we actually need is the false reject rate, and false reject data is less widely available. NIST publishes false reject data but for a threshold age of 25, and, as mentioned before, based on lower quality images. I reached out to Yoti and they were able to provide false reject data between 18 and 30 for a threshold age of 20. The false reject rate at 30 is .12%, so we can assume that above that age the rate is vanishingly small.

Because the age estimates cluster around the user's true age, we can trade off the false accept versus false reject rate by moving the threshold around. For example, very few 18 year olds will look 25, so if we choose to accept only people who the algorithm thinks are 25 or older (this is called having a 7 year "buffer"), then we will have a very low false accept rate. The cost of this, however, is a very high false reject rate: not only do we reject 18 year olds who look like they are 17, we reject the majority of people who are under 25 as well (by design!). Importantly, while we can trade off false accepts for false rejects by moving the threshold, there is no way to have a setting which minimizes both, because the algorithm just has an inherent level of error. As a practical matter, these systems are typically deployed with a 2-5 year buffer, so that the false accept rate is quite low but the false reject rate is quite high, mostly for people who are in the range 18-25.

Government IDs #

At the far end of the spectrum we have government ID-based systems. These work the way you think they do: the user is prompted to provide an image of some form of government ID (e.g., a driver's license, passport, etc.) and then, just as with facial age estimation, provide an image of their face. This can happen in various ways, but one obvious one is just to hold up your id next to your face.

Once it has captured the image or video, the evaluator needs to do two things:

  • Verify that the ID is valid and extract the contents, specifically the date of birth.
  • Compare the user's image to the ID and determine if they correspond the same person.

Because these IDs are issued by the government, the ages they contain can be expected to be highly accurate, and as long as the image is of reasonable quality, the evaluator should be able to extract the user's data of birth: modern optical character recognition algorithms are quite good, and many modern IDs have bar codes or other machine readable mechanisms that provide accurate data. There is still plenty to go wrong, especially around validating the credential and verifying that the user matches the credential (more on this below), but assuming those pieces work out, the result is going to be an accurate age.

The bigger problem is that many people (somewhere around 9% of US adults do not have a valid driver's license) do not have government issued IDs at all, which obviously prevents them from using government issued IDs to demonstrate their age. In addition, some people are unable to show their face for religious reasons or have visible facial differences which cause problems for the facial recognition systems that need to match the user's face to the ID.

End-to-End Error Rates #

At this point, you could be forgiven for thinking "those people he's criticizing are right, facial age estimation is inaccurate and government ID-based systems are accurate." The problem is that this is too narrow a view: instead of thinking about the small scale properties of each mechanism, you need to think about the end-to-end properties when they are embedded in a system. The important question to ask is the following:

What are the overall false accept and reject rates of these mechanisms as deployed?

As a practical matter, however, because these mandates are targeted at preventing minors from accessing certain kinds of content, this bounds the false accept rate, either implicitly, as with the OfCom requirement that "service providers which allow pornography must implement highly effective age assurance to ensure that children are not normally able to encounter pornographic content" or explicitly, as with the NY Safe For Kids Act proposed rules that require specific maximum false accept rates:

Age Range False Accept Rate
0-7 .1%
8-13 1%
14-15 2%
16 8%
17 15 %

ID-based age assurance systems already have a low false accept rate (ignoring circumvention for the moment). As we saw above, it is possible to tune facial age estimation thresholds to trade off false accept vs. false reject rates, but if we're already required to keep the false accept rate down, then we can just compare the false reject rates for these two systems.

This is actually a trickier question than it sounds, because the two systems have very different rejection profiles:

  • Facial age estimation systems will often reject people who are close to the age threshold, but are very unlikely to reject people who are much older.

  • Government ID-based systems will rarely reject people who have ID, but reject anyone who does not.

The figure below compares facial age estimation and ID-based systems by age bracket. The facial age estimation false reject rate (blue bar) is from data kindly provided by Yoti for their system. with 20 (Challenge-20). The orange bar shows the rate of people in the US who don't have driver's licenses. This isn't a perfect comparison for a number of reasons:

  • Some people might have some other government ID like a passport but not a driver's license, and the situation may be different in other countries where government IDs are required.

  • Some people may not be able to use any system which depends on facial analysis (as mentioned above).

Despite this, I think it gives a reasonable picture of the situation.

Comparing age estimation and ID by age bracket

Comparison of facial age estimation and ID. The blue bar shows the rate of false rejections for facial age estimation as provided by Yoti for Challenge-20. The orange bar shows the fraction of adults without US driver licenses, taken from Federal Highway Administration statistics.

At this point, you might want to complain (as Gemini did when I showed it this post) that I'm treating not having a license as a "false reject rate" when it's actually a "failure to enroll". This is technically true, but operationally it misses the point: what we're trying to do is to accurately discriminate between users who are in the right age and those who are not; IDs are just a means to an end. From that perspective, whether we reject people because they look underage or because they have no ID doesn't matter; in both cases we rejected someone we should not have.

As shown in this figure, false rejects for facial age estimation are clustered around the ages just above 18, but we should expect false rejects for ID-based systems—at least in the US—to to be spread across the entire age spectrum. For ages 18-20, facial age estimation is somewhat worse, but above that age, it has a trivial error rate and ID-based systems still have significant false reject rates.

In line with our previous discussion of base rates, to really compare these we need to understand the rate at which people engage with age assurance, which depends on the rate at which people use different types of platforms. The table below shows traffic rates for Facebook (source: SimilarWeband Pornhub), reflecting the two main types of platform subject to age assurance mandates: social media and adult content:

Age Range Facebook Pornhub
18-24 15.5% 31%
25-34 24.46% 30%
35-44 18.62% 16%
45-54 15.56% 11%
55-64 14.57% 7%
65+ 10.29% 5%

As you can see, a very large fraction of the users of both sites is 25 or above. This group of users would be very unlikely to be rejected by facial age estimation systems but a moderate fraction of these users would still be rejected by an ID-based system. Because the age brackets don't line up perfectly, it's a little tricky to figure out the overall false reject rate, but if we cheat a little bit and assume that usage is evenly spread out for each age within each age bracket and then merge the 5 year brackets we have above into 10 year brackets by averaging, we get the following approximate overall false rejection rates.

Site Facial Age Estimation ID-Based
Facebook 6.9% 11.3%
PornHub 13.5% 13.5%

As you can see, facial age estimation performs a lot better on a Facebook-style audience because that audience skews a lot older and ID-based systems tend to reject a lot of older people whereas facial age estimation does not. Both systems do worse on PornHub than Facebook because the perform worse on younger age cohorts and the PornHub audience skews younger, so the systems perform similarly.

Under 18s #

I've restricted the discussion here mostly to 18+ because it's the most common threshold, but of course there are other age ranges that one could be interested in. For example, Australia's new social medium minimum age is set at 16. ID systems tend to perform much worse at younger ages because many children do not have more ID. Facial age estimation systems are still usable at younger ages, albeit with significant error margins, but are not suitable for making multiple discriminations, such as having one threshold at 16 and one at 18, because they're able to make fine discriminations below 2 years.

Circumvention #

All of the analysis above only considers the inherent false rejection rates of these technologies (the combination of what Arnao, Cooper, and I called "baseline accuracy" and "availability") rather than how resistant to circumvention they are. Quantifying circumvention resistance is challenging because any measurement needs to be taken within the context of a given set of circumvention techniques.

For example, both facial age estimation and ID-based systems are potentially subject to "injection attacks" where an attacker uses fake video inputs to make them look older or like someone else. There are technical mechanisms which attempt to detect AI-generated content, and we can measure how well they work for a given set of AI algorithms, but if an attacker devises new algorithms then the effectiveness of these detection techniques may well be different, and of course attackers have an incentive to switch to techniques which are more effective. For this reason, it is not generally practical to produce a single effectiveness measure against circumvention.[5]

Most of the age assurance mandates I have seen have qualitative formulations like "highly effective" or "commercially reasonable", which require some judgement call about the attack environment. By contrast, the New York Office of the Attorney General's proposed rules for the SAFE For Kids Act required "a rate of detecting method circumvention for an age assurance method that meets or exceeds 98%". This sounds precise but I don't think really works for two reasons.

First, we can only make this measurement against some assumed base rate of circumvention attempts. If most of those attacks are unsophisticated, then the overall detection rate will be very high, even if good attacks are possible. Moreover, it's not even clear what attacks are in scope: if a minor gets an adult to perform age assurance for them, this is not really possible to circumvent, especially with 98% accuracy, so does that mean that no age assurance mechanism is acceptable?

Second, a circumvention defense that is highly effective today might become ineffective tomorrow if a good attack is discovered (this happens frequently in cryptography), which could suddenly render a platform noncompliant.

Combining Multiple Systems #

Because both facial age estimation and ID-based systems have quite high false reject rates, deploying just one of those systems will have the impact of excluding a large number of people, especially in the 18-24 range. This impact can be reduced by supporting multiple methods of age assurance.

A conventional approach here is to use what's called a "waterfall", where you offer users options in sequence, starting with nominally lower friction approaches like facial age estimation and then moving towards higher friction approaches like ID-based mechanisms, and then maybe towards some eventual non-automated appeal process, with the idea that the majority of people will eventually be able to demonstrate their age.

When you have multiple mechanisms available, the reject rate is determined by the fraction of users who cannot demonstrate their age by any of the available mechanisms. As a result, it will generally be no higher than the mechanism with the lowest reject rate[6] This means that it is both less likely that eligible users will be rejected but also that less likely that ineligible users will be rejected.

Importantly, you can't just determine the reject rate by simple math. For example, both of these mechanisms show false reject rates over 1/3 for 18 year olds: in the best case scenario these populations would be entirely disjoint and so every user would be able to successfully demonstrate their age with one of these mechanisms, but in the worst case scenario these populations would overlap entirely, in which case over a third of people would be excluded. What is much more likely is that they are mostly independent; there is no particular reason to think that not having an ID corresponds with looking unusually young, although some people, such as those who do not want to show their face for religious reasons, would be rejected by both mechanisms. In this case, the total false reject rate is roughly the product of the false reject rates, in which case around 20% of 18 year olds would unable to demonstrate their age with either facial age estimation or an ID-based system.

Combined false reject rate

Estimated false reject rate for facial age estimation and ID-based systems, assuming independent errors.

You'd need further research to know the extent to which other popular mechanisms (e.g., government and commercial records-based "age inference") mechanisms were able to cover these users.

The same reasoning of course applies to circumvention: an attacker only needs to successfully circumvent one of the available age assurance mechanisms, so the more mechanisms are on offer the weaker the system is overall, but the exact extent to which that is true depends on the natural of the systems and the attacker's capabilities.

The implication for age assurance systems—and age assurance mandates—is that you need to think about the collective error rates for the set of available age assurance mechanisms as well as the individual error rates. Doing otherwise gives you a false picture of the accuracy of the system.

The Bottom Line #

The important thing to realize here is that age assurance is not a simple measurement like measuring how long something is.[7] Rather, it's a complex process of combining a set of individual measurements which are proxies for age into a final decision about a user's age eligibility. This is true both at the individual mechanism level and the collective level where you try to make a decision about a user based on a set of mechanism results. This complexity is obvious in the case of facial age estimation which obviously involves some kind of machine learning model, but is harder to see in the case of ID-based systems which are superficially simple (you just look at the birthday and do some subtraction). This is how you get the counterintuitive result that ID-based systems are more accurate at determining any individual user's age but can be less accurate on the whole in terms of accepting eligible users and rejecting ineligible users.


  1. Note that we're not really measuring the quantity directly, but instead deciding what mixture of stuff to put on the test stick, but that's a proxy for that measurement. ↩︎

  2. Do this three times and you're well on your way to becoming an Effective Altruist. ↩︎

  3. And don't even get me started on Type I and Type II error, which I can only remember this way. ↩︎

  4. By which I mean I generated it with R rather than using real data. ↩︎

  5. This is also true in other settings. For example, the seasonal influenza vaccine is designed to match the strains of flu which are expected to be common in the coming season. Depending on how good that prediction is, the effectiveness of the vaccine can vary quite substantially. ↩︎

  6. And should generally be lower. ↩︎

  7. Though this is a complicated question in and of itself due to factors like the thermal stability of your measuring apparatus, which is why the meter is now defined by reference to the speed of light in vacuum rather than by the length of a platinum bar. ↩︎